Supplementary Files for “An Alternative Theory of the Plant Size Distribution with an Application to Trade”

August, 2012

Description of Supplementary Files for “An Alternative Theory of the Plant Size Distribution, with Geography and Intra- and International Trade”

Thomas J. Holmes (University of Minnesota, Federal Reserve Bank of Minneapolis, and NBER, holmes@umn.edu)

and

John J. Stevens (Board of Governors of Federal Reserve System)

Note: The statistics reported in this paper that were derived from Census Bureau micro data were screened to ensure that they do not disclose confidential information. The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis, the Federal Reserve Board, the Federal Reserve System, or the U.S. Bureau of the Census.

FILE GROUP 1: Raw Data Files and Results of Stage 1 Estimation

File	Description	Link to Files	Link to Contents/documentation
stage1_estimates_NAICS1997 stage1_estimates_SIC1992	Estimates of parameters of distance adjustment from first stage estimation	SAS Files ●NAICS1997 ●SIC1992 CSV Files ●NAICS1997 ●SIC1992 ascii file for input into Gauss program ●stage1_estimatesNAICS97_for_gauss.asc	stage1_estimates_documentation.html
mandat_naics2	NAICS level data set for the 473 NAICS manufacturing industries for 1997. Have various industry statistics, including imports and plant counts.	SAS File ● mandat_naics2. sas7bdat CSV File ● mandat_naics2.csv	mandat_naics2.html
diffuse_naics	NAICS level data set for the 473 NAICS manufacturing industries for 1997. The key variable is "diffuse" which equals 1 if classified as a diffuse demand industry as explained in the text, and otherwise is 0.	SAS File ● diffuse_naics. sas7bdat CSV File ● diffuse_naics.csv
ea_pop_long_lat97	Economic Area (EA) Level information for 177 EAs including population and geographic coordinates	SAS File ● ea_pop_long_lat97.sas7bdat ascii file for input into Gauss ● ea_pop_long_lat97.asc	ea_pop_long_lat97.html
ea_pop97	EA population for 2007	SAS File ● ea_pop07.sas7bdat ascii file for input into Gauss ● ea_pop07.asc	ea_pop07.html
ea_CM97	ea×NAICS level data (177 ea times 172 diffuse demand NAICS industries). For each industry and location the file contains the data needed to estimate the model for 1997. The “CM” stands for Census of Manufactures. The source of this data is the publicly available Location of Manufactures data	SAS File ● ea_cm97.sas7bdat (31.2 MB file) ascii file for input into Gauss ● ea_CM97.asc (5.4 MB file)	ea_CM97.html
ea_CBP07	ea×NAICS level data (177 ea times 172 diffuse demand NAICS industries). For each industry and location the file contains the data needed to estimate the model for 2007. The “CBP” stands for County Business Patterns. The source of this data is the publicly available CBP data	SAS File ● ea_cbp07.sas7bdat (25.8 MB file) ascii file for input into Gauss ● ea_cbp07.asc (5.9 MB file)	ea_CPB07.html
sal97_scalefactor	473×2 ascii file, NAICS in first column, sal97_scalefactor in second, where sal97_scalefactor=salUS_geo/snormUS_LM (see documentation for ea_CM97 for definitions)	ascii file ● sal97_scalefactor.asc
ea_dist_within_tract	For each ea contains estimate of internal distance with in the ea, base on use of tract level population data	ascii file ea_dist_within_tract.asc
tradedat_forgauss	Estimates of new China share for each industry	tradedat_forgauss.asc
china_ea_con_share	Estimate of share of china manufacturing imports by ea for 2007	china_ea_con_share.asc

FILE GROUP 2: Programs to Run Stage 2 and to Calculate Tables

File	Description	Link to Files	Link to Contents/documentation
base97.prg	Gauss program that runs stage 2 for 1997. It runs 10 iterations and saves the results for each iteration. For each iterations, it produces count coefficients λ^P and λ^S, and estimate of the Γ=(γ₁, γ₂,.. γ₁₇₇). It simulates the impact of a China Surge for 2007, using the 1997 estimates. We note a change in notation. The count coefficients referred to in the text as λ^P and λ^S are referred to as nuT and nuN in the programs and output.	program base97.prg ascii output files base97_nu_bsize.asc base97_naics_level.asc base97_naics_levelC.asc base97_loc_level.asc	model_base97.html
base07.prg	Same as above, only calculates the model for 2007, using the 2007 data	base07.prg ascii output files base07_nu_bsize.asc base07_naics_level.asc base07_loc_level.asc
Table_6_7_process_model.sas	SAS program to process the results of the 1997 estimates. Creates the tables used in the paper. note a change in notation. Note again that the count coefficients referred to in the text as λ^P and λ^S are referred to as nuT and nuN in the programs and output.	Table_6_7_process_model.sas Results Table_6_7_process_model.html
Table_8_New_China_Share_ Descriptive_statistics	Produces Table 8	SAS input file: tradedat_forgauss.sas7bdat program Table_8_New_Chins_Share_ Descriptive_statistics Output
Table10_data_statistics	Constructs Table 10, some statistics from the data	Table10_data_statistic_high_concentration.sas Output
setup_mean_reversion_iter1 setup_mean_reversion_iter10	This program takes the gam estimate for 1997 and 2007, make an 11 point grid (for gam=0 at bottom and then 10 categorices for ln(gam)). Then it estimates a transition matrix for the gam, for che case where new china share=0 (88 industries) Note first that we have estimates of gam for the case where we have a constraint that all sales are primary (this is iter=1 case). The other where we take out speciality segment sales. (This is iter=10 case, assume convergence by this point). When we do the primary-only segment model case, we do iter=1. For full model, we do iter=10.	Takes as input files mentioned above Programs: setup_mean_reversion_iter1.sas setup_mean_reversion_iter10.sas Outputs: html output: iter1. iter10 Inputs for gauss program gamorig_and_fit_iter1.asc (original gam, rescaled, and fitted values, as well as values with china share with no regression to the mean pmat6_iter1.asc The transition matrix of the 11 states., average in industris with New China Share=0 mean_lngam97_iter1.asc The grid points for gam>0 (10 values) naicslist_iter1.asc Basic information about the industry Values for iter=10 gamorig_and_fit_iter10.asc pmatt6_iter10.asc mean_lngam97_iter10.asc naicslist_iter10.asc
Tab11_prediction_primary_only.prg Tab11_prediction_specialty.prg	Set of programs for main prediction exercise	Programs Tab11_prediction_primary_only.prg Tab11_prediction_specialty.prg Summary Output Tab11_prediction_primary_only.out Tab11_prediction_specialty.out Results of individual simulations z_output_all_1.asc' z_output_sim_1.asc z_output_simChina_1.asc z_output_all_2.asc' z_output_sim_2.asc' z_output_simChina_2.asc output_sim_China_newpop_2.asc output_dataLQ_2.asc Final wrap-up SAS program Tab11_prediction_last_step.sas Tab11_prediction_last_step.html
Table12_1997_2007_sim.sas	SAS program to compare 1997 and 2007 estimates of the primary count share.	Table12_1997_2007_sim.sas Table12_1997_2007_sim.html