Layout of ea_CM97 file

(Note: there is a SAS version of the file and an ascii version, with different layouts)

 

Aug 1, 2012

 

Description of Supplementary Files for “An Alternative Theory of the Plant Size Distribution, with Geography and Intra- and International Trade”

 

by

Thomas J. Holmes (University of Minnesota, Federal Reserve Bank of Minneapolis, and NBER)

and

John J. Stevens (Board of Governors of Federal Reserve System)

 

Note: The statistics reported in this file are all derived from public Census data

 

Description of SAS File (ea_CM97.sas7bdat)

ea×NAICS level data (177 economic areas times 172 diffuse demand NAICS industries).  For each industry and location the file contains the data needed to estimate the model for 1997. The “CM” stands for Census of Manufactures.  The source of this data is the publicly available Location of Manufactures (LM) data. (This is file E9731e2 from

the 1997 Economic Census CD (U.S. Bureau of Census (2001)).

 

One thing to explain is that the LM data has cell counts for each county×NAICS for the following size catagories.  The table also reports the variables avgemp and s_norm for each size class.  The construction of avgemp is described in the table. The construction of s_norm is described in the appendix to the paper.  

 

Employment Size Category Number

Employment Range

AVGEMP

 

(Average Employment in Size Class (Across all Manufacturing Plants in 1997 CM)

S_NORM

 

(Norm sales in size cat, with s_norm=1 for smallest category)

1

1-4

2.1

1.0

2

5-9

6.7

2.3

3

10-19

13.7

4.8

4

20-49

31.3

12.1

5

50-99

70.1

30.3

6

100-249

154.1

76.8

7

250-499

345.9

194.1

8

500-999

679

391.6

9

1000-2499

1448.1

928.9

10

2500 and above

4795.3

3050.3

 

 

Variable

Description

NAICS

6 digit NAICS

ea

Economic Area code used by BEA (1 to 179 basic)

ea_index

ea code used by us, numbered 1 to 177 (as we drop Alaska and Hawaii)

eatext

text description of economic area

emphatUS_LM

Sum of emphat_LM across all 177 contiguous EAs

emphat_LM

This is estimated employment in the ea for the 1997 CM, using the LM plant counts and weighting them by the variable AVGEMP defined above.

est12_LM

Plant counts in size category 1 and 2 combined.

est13_LM

Plant counts in size category 1, 2, and 3 combined

est1_LM

Plant counts in size category 1

estUS_LM

Plant counts summed over all the 177 contiguous EAs in the United States in the 1997 LM

est_LM

Count of plants in the EA, in the 1997 Location of Manufacturing plants

naicsindex

index from 1 to 473 of the 473 manufacturing NAICS industries (sorted by NAICS)

naicstext

descriptive text of industry

pop97

1997 population of EA

pop97_US

sum of population across 177 EAs (this is U.S. population less Alaska and Hawaii)

salUS_geo

total US sales revenue of industry in 1997 CM published tabulations

salhat

=(snorm_LM/snormUS_LM)*salUS_geo;

salhat1

=(snorm1_LM/snormUS_LM)*salUS_geo;

salhat12

=(snorm12_LM/snormUS_LM)*salUS_geo;

salhat13

=(snorm13_LM/snormUS_LM)*salUS_geo;

snorm12US_LM

Sum of snorm12_LM across all EAs

snorm12_LM

Take the sum of S_NORM across all plants for size_category in {1,2}

snorm13US_LM

Sum of snorm13_EA across all eas.

snorm13_LM

Take the sum of S_NORM across all plants for size_category in {1,2,3}

snorm1_LM

Take the sum of S_NORM across all plants for size_category in {1}

snorm1US_LM

Sum of snorm1_LM across all EAs

snormUS_LM

Sum of snorm_LM across all EAs

snorm_LM

Sum of S_NORM across all plants in EA across all size classes

 

Description of ASCII File (ea_CM97.asc)

This is just like above, except in ASCII format for reading into Gauss.  In particular, there is no header row there 13 columns of data, where the columns are the following:

 

Column

Variable from Above

1

ea_index

2

ea

3

naicsindex

4

naics

5

emphat_LM

6

salhat

7

salhat1

8

salhat12

9

salhat13

10

est_LM

11

est1_LM

12

est12_LM

13

est13_LM