Layout of ea_CPB07 file
(Note: there is a SAS
version of the file and an ascii version, with different layouts)
Aug 1, 2012
Description of Supplementary
Files for “An Alternative Theory of the Plant Size Distribution, with Geography
and Intra- and International Trade”
by
Thomas J. Holmes (University of Minnesota,
Federal Reserve Bank of Minneapolis, and NBER)
and
John J. Stevens (Board of
Governors of Federal Reserve System)
Note: The statistics
reported in this file are all derived from public Census data
Description of SAS File (ea_CBP07.sas7bdat)
ea×NAICS level data (177
economic areas times 172 diffuse demand NAICS industries). For each industry and location the file
contains the data needed to estimate the model for 2007. The “CBP” stands for
County Business Patterns. The source of
this data is the publicly available County Business Patterns(CBP) data.
One thing to explain is that
the CBP data has cell counts for each county×NAICS for the in size
categories. The CBP size categories are
more disaggregated than the LM categories used for ea_CM97. We aggregate the CBP to the level of the LM
(this combines the 1000-1499 and 1500-2499 into 1000-2499, and the 2500-4999
and 5000 plus into 2500 plus).
Thus we use the same size
categories as described in ea_CM97.html. We use the same values for AVGEMP and S_NORM.
Variable |
Description |
NAICS |
6 digit NAICS |
ea |
Economic Area code used by
BEA (1 to 179 basic) |
ea_index |
ea code used by us,
numbered 1 to 177 (as we drop Alaska and Hawaii) |
eatext |
text description of
economic area |
emphatUS_cbp |
Sum of emphat_cbp over all
177 EAs in contiguous U.S. |
emphat_cbp |
Take AVGEMP of each plant
in EA for this industry and sum |
est12_cbp |
establishment counts in
sizecat in {1,2} in EA |
est13_cbp |
establishment counts in
sizecat in {1,2,3} in EA |
est1_cbp |
establishment counts in
sizecat in {1} in EA |
estUS_cbp |
sum of est_cbp over all
177 EAs in contiguous U.S. |
est_cbp |
establishment counts
across all plants in this NAICS in EA |
naicsindex |
index from 1 to 473 of the
473 manufacturing NAICS industries (sorted by NAICS) |
naicstext |
descriptive text of
industry |
pop2000 |
2000 population of EA |
pop2007 |
2007 population of EA |
pop2000_US |
sum of pop2000 over 177 contiguous
U.S. EAs |
pop2007_US |
sum of pop2007 over 177
contiguous U.S. EAs |
sal97_scalefactor |
salUS_geo/snormUS_LM (see documentation for
ea_CM97 for definitions) |
salhat |
=snorm_cbp*sal97_scalefactor
(i.e., use the 2007 plant counts,
by set them to the 1997 s_norm levels and rescale to put it in 1997 terms. |
salhat1 |
=snorm1_cbp*sal97_scalefactor
|
salhat12 |
=snorm12_cbp*sal97_scalefactor
|
salhat13 |
=snorm13_cbp*sal97_scalefactor
|
snorm12US_cbp |
Sum of snorm12_LM across
all EAs |
snorm12_cbp |
Take the sum of S_NORM
across all plants for size_category in {1,2} |
snorm13US_cbp |
Sum of snorm13_LM across
all EAs |
snorm13_cbp |
Take the sum of S_NORM
across all plants for size_category in {1,2,3} |
snorm1US_cbp |
Sum of snorm1_LM across
all EAs |
snorm1_cbp |
Take the sum of S_NORM
across all plants for size_category in {1} |
snormUS_cbp |
Sum of snorm_LM across all
EAs |
snorm_cbp |
Take the sum of S_NORM across
all plants for all size categories |
Description of ASCII File (ea_CM97.asc)
This is just like above,
except in ASCII format for reading into Gauss.
In particular, there is no header row there 13 columns of data, where
the columns are the following:
Column |
Variable from Above |
1 |
ea_index |
2 |
ea |
3 |
naicsindex |
4 |
naics |
5 |
emphat_CBP |
6 |
salhat |
7 |
salhat1 |
8 |
salhat12 |
9 |
salhat13 |
10 |
est_CBP |
11 |
est1_CBP |
12 |
est12_CBP |
13 |
est13_CBP |