August, 2012

 

Description of Supplementary Files for “An Alternative Theory of the Plant Size Distribution, with Geography and Intra- and International Trade”

by

Thomas J. Holmes (University of Minnesota, Federal Reserve Bank of Minneapolis, and NBER, holmes@umn.edu)

and

John J. Stevens (Board of Governors of Federal Reserve System)

 

Note: The statistics reported in this paper that were derived from Census Bureau micro data were screened to ensure that they do not disclose confidential information.  The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis, the Federal Reserve Board, the Federal Reserve System, or the U.S. Bureau of the Census.

 

 

FILE GROUP 1: Raw Data Files and Results of Stage 1 Estimation

File

Description

Link to Files

Link to Contents/documentation

 

stage1_estimates_NAICS1997

 

stage1_estimates_SIC1992

Estimates of parameters of distance adjustment from first stage estimation

SAS Files

NAICS1997

SIC1992

CSV Files

NAICS1997

SIC1992

ascii file for input into Gauss program

stage1_estimatesNAICS97_for_gauss.asc

stage1_estimates_documentation.html

mandat_naics2

NAICS level data set for the 473 NAICS manufacturing industries for 1997.  Have various industry statistics, including imports and plant counts.

SAS File

mandat_naics2. sas7bdat

CSV File

mandat_naics2.csv

mandat_naics2.html

diffuse_naics

NAICS level data set for the 473 NAICS manufacturing industries for 1997.  The key variable is "diffuse" which equals 1 if classified as a diffuse demand industry as explained in the text, and otherwise is 0.

SAS File

diffuse_naics. sas7bdat

CSV File

diffuse_naics.csv

 

ea_pop_long_lat97

Economic Area (EA) Level information for 177 EAs including population and geographic coordinates

SAS File

ea_pop_long_lat97.sas7bdat

ascii file for input into Gauss

ea_pop_long_lat97.asc

ea_pop_long_lat97.html

ea_pop97

EA population for 2007

SAS File

ea_pop07.sas7bdat

ascii file for input into Gauss

ea_pop07.asc

ea_pop07.html

ea_CM97

ea×NAICS level data (177 ea times 172 diffuse demand NAICS industries).  For each industry and location the file contains the data needed to estimate the model for 1997. The “CM” stands for Census of Manufactures.  The source of this data is the publicly available Location of Manufactures data

SAS File

ea_cm97.sas7bdat (31.2 MB file)

ascii file for input into Gauss

ea_CM97.asc (5.4 MB file)

ea_CM97.html

ea_CBP07

ea×NAICS level data (177 ea times 172 diffuse demand NAICS industries).  For each industry and location the file contains the data needed to estimate the model for 2007. The “CBP” stands for County Business Patterns.  The source of this data is the publicly available CBP data

SAS File

ea_cbp07.sas7bdat (25.8 MB file)

ascii file for input into Gauss

ea_cbp07.asc (5.9 MB file)

ea_CPB07.html

sal97_scalefactor

473×2 ascii file, NAICS in first column, sal97_scalefactor in second, where sal97_scalefactor=salUS_geo/snormUS_LM

(see documentation for ea_CM97 for definitions)

ascii file

sal97_scalefactor.asc

 

ea_dist_within_tract

For each ea contains estimate of internal distance with in the ea, base on use of tract level population data

ascii file

ea_dist_within_tract.asc

 

tradedat_forgauss

Estimates of new China share for each industry

tradedat_forgauss.asc

 

china_ea_con_share

Estimate of share of china manufacturing imports by ea for 2007

china_ea_con_share.asc

 

 

 

FILE GROUP 2: Programs to Run Stage 2 and to Calculate Tables

File

Description

Link to Files

Link to Contents/documentation

base97.prg

 

Gauss program that runs stage 2 for 1997.  It runs 10 iterations and saves the results for each iteration.  For each iterations, it produces count coefficients λP and λS, and estimate of the Γ=(γ1, γ2,.. γ177).  It simulates the impact of a China Surge for 2007, using the 1997 estimates.  We note a change in notation.  The count coefficients referred to in the text as λP and λS are referred to as nuT and nuN in the programs and output.

program

base97.prg

 

ascii output files

 

model_base97.html

base07.prg

 

Same as above, only calculates the model for 2007, using the 2007 data

base07.prg

 

ascii output files

 

 

 

Table_6_7_process_model.sas

SAS program to process the results of the 1997 estimates.  Creates the tables used in the paper.  note a change in notation.  Note again that the count coefficients referred to in the text as λP and λS are referred to as nuT and nuN in the programs and output.

Table_6_7_process_model.sas

Results

Table_6_7_process_model.html

 

Table_8_New_China_Share_

Descriptive_statistics

Produces Table 8

SAS input file:

tradedat_forgauss.sas7bdat

 

program

Table_8_New_Chins_Share_

Descriptive_statistics

 

Output

 

 

Table10_data_statistics

Constructs Table 10, some statistics from the data

Table10_data_statistic_high_concentration.sas

Output

 

setup_mean_reversion_iter1

setup_mean_reversion_iter10

This program takes the gam estimate for 1997 and 2007, make an 11 point grid (for gam=0 at bottom and then 10 categorices for ln(gam)).  Then it estimates a transition matrix for the gam,  for che case where new china share=0 (88 industries)

 

Note first that we have estimates of gam for the case where we have a constraint that all sales are primary (this is iter=1 case).  The other where we take out speciality segment sales.  (This is iter=10 case, assume convergence by this point).  When we do the primary-only segment model case, we do iter=1.  For full model, we do iter=10.

 

 

Takes as input files mentioned above

 

Programs:

setup_mean_reversion_iter1.sas

setup_mean_reversion_iter10.sas

 

Outputs:

html output: iter1. iter10

 

Inputs for gauss program

 

gamorig_and_fit_iter1.asc

(original gam, rescaled, and fitted values, as well as values with china share with no regression to the mean

pmat6_iter1.asc

The transition matrix of the 11 states., average in industris with New China Share=0

mean_lngam97_iter1.asc

The grid points for gam>0 (10 values)

naicslist_iter1.asc

Basic information about the industry

 

Values for iter=10

gamorig_and_fit_iter10.asc

pmatt6_iter10.asc

mean_lngam97_iter10.asc

naicslist_iter10.asc

 

 

 

Tab11_prediction_primary_only.prg

Tab11_prediction_specialty.prg

 

Set of programs for main prediction exercise

Programs

Tab11_prediction_primary_only.prg

Tab11_prediction_specialty.prg

 

Summary Output

Tab11_prediction_primary_only.out

Tab11_prediction_specialty.out

 

Results of individual simulations

z_output_all_1.asc'

z_output_sim_1.asc

z_output_simChina_1.asc

z_output_all_2.asc'

z_output_sim_2.asc'

z_output_simChina_2.asc

output_sim_China_newpop_2.asc

output_dataLQ_2.asc

 

Final wrap-up SAS program

Tab11_prediction_last_step.sas

Tab11_prediction_last_step.html

 

 

Table12_1997_2007_sim.sas

SAS program to compare 1997 and 2007 estimates of the primary count share.

Table12_1997_2007_sim.sas

Table12_1997_2007_sim.html