March 3, 2010
Documents programs used for “The Diffusion of Wal-Mart and Economies of Density,” Econometrica, Vol. 79, No. 1 (January, 2011), 253-302
by Thomas Holmes
This file explains the programs and data files used to calculate to obtain for any year and any configuration of Wal-Mart stores, the resulting sales and operating profit for Wal-Mart and well as distribution distance miles.
See documentions_deviations for information about the deviations.
There are two programs:
setup.prg: puts in memory the main procedures and reads in the data.
var_annual_calc_actual_policy.prg: is an example use of the program. It calculates for each year, the simulated profits, sales etc. for Wal-Mart of the policy it actually chose. When the perturbation analysis was run, alterative policies were entered as input here.
Note 1: There are 3176 stores, but in Wal-Mart's numbering system, there are stores that go to 5498. It was convenient to structure the matrices so that row i of a matrix corresponds to the Wal-Mart store number i. So all the store data has 5498 rows rather than 3176. In cases where the row corresponds to a store number that is not actually a Wal-Mart store (as of Jan 31, 2006) then this just has missing values or zeros. This format of leaving a blank space for missing stores is call "fill format" in the programs (if you see the word fill then thats what it means.) Now there is a distance matrix of each store from each other store (store_dist_mat.fmt) that is not in fill format because otherwise it is goo big. So that is 3176x3176
The main thing to explain here is the program setup.prg that does the main work.
There is a procedure at the bottom proc var_annual_calc(,) that takes as input the what stores are open in a given year and what year it is and calculates all the variables of interest. I will explain the inputs and outputs of this procedure and then make a few observations about the data that is being read in.
** var_annual_calc - Calculate 12 variables for a given store configuration in particular year
** Input store_exist: 5498*1 vector. ith =1 implies general merchansise store #i does exist.
** super_exist: 5498*1 vector. ith =1 implies super center #i does exist.
** year_evaluate: scalar. Year we evaluate each output.
** x: 5498*1 vector. ith raw=1 implies regular store #i has been in operation for more than two years. (for demand model)
** Output (all vars are scalars)
** sal_gen: sales of general merchandise (total over all stores), in $million 2005 dollars
** sal_groc: sales of groceries in $million 2005 dollars
** wage_gen: multiply labor_requirement=3.6146 to get annual wages, $million 2005 for general merchandise
** wage_groc: multiply labor_requirement=3.6146 to get annual wages, $million 2005 for groceries
** rent_gen mulitply by .36118*(1/5) to get annual rent, $million 2005 for general merchansie
** rent_groc mulitply by .36118*(1/5) to get annual rent, $million 2005 for groceries
** lnneig5_gen: (corresponds to C1 in the paper for general merchandise)
** lnneig5_2_gen:(corresponds to C2 in the paper for general merchandiie)
** lnneig5_groc: (corresponds to C1 in the paper for food (called grocery before)
** lnneig5_2_groc: (corresponds to C2 in the paper for food (called grocery before)
** dc_gen_dist: the sum of the distance from each regular stores to the closest regular distribution center.
** dc_groc_dist: the sum of the distance from each super centers to the closest grocery distribution center.
The data that is read in is in the form of Gauss Format Files
A note about units:
The number 3.6146 is the number of workers per million dollars in sales. So wage_gen would be what the wage bill would be if one worker were needed per million in sales.
.36118 is obtained from the 46 observations on actual store
property values in
It is the slope of a regression (with no constant) of the ratio land value ($1,000) on sales ($1,000,000 per year).
Now in the construction of rent_gen and rent_groc, we multiply by .001 to get in millions. So then we get a fitted value of the land price (in millions) per million dollar in sale. Multplying this by the million dollars in sales, we get a fitted value of the land values. then we multiply by 1/5 to get the land rent.
(Note: in the text the regression slope is noted to be .036. But the units are different. For that, the left hand side is in percent (so multipi by 100 after dividing by 1,000 to convert to millions add the extra zero.)
Data on N=206,960 Census 2000 block groups
1 id_blkgrp2000 (index variable created by Holmes)
2 pop (population)
3 neig5 (popution within 5 miles, where draw circle of 5 mile radius aroung long and lat of block group and see what other block groups are in this)
4 neig10 (same only 10 miles, not used)
5 neig15 (same only 15 miles, not used)
6 lat (lat of block group from census)
7 long (long of block group from census)
8 pci (per capita income)
9 black_share (share of population black)
10 young_share (share of popultion 21 and younger)
11 young_share old_share (share of population 65 and above)
12 dist_walmart (distance to closest Wal-Mart as of Jan 31, 2006)
file store_neig_2000, first_last_blkgrp_sort_store2000
these file contain information for each store about what blockgroups are within 30 miles.
Take store #i (from 1 to 5898). Go to first_last_blkgrp_sort_store2000[i,2] and [i,3]. These are the first rows and last rows in the file store_neig_2000 that contain the block groups that the store is within 30 miles of. The first column is the index of the block group (id_blkgrp2000) and the second column is the distance to the block group.
file store_neig_sort_store first_last_store2000,
analogous to above only sorted differently.
first_last_store has a row corresponding to each 2000 blockgroup. For each, is specifies rows in store_neig_sort_store that contains the stores that are within 30 miles
These same files are constructed for the 1990 and the 1980 Censuses.
Contains 5498 rows, where row i is store #i.
4 year_open_state (first year a Wal-mart is opened in ths state)
5 year_super_state (first year a supercenter is opened in the state)
6 fips (fips code of state, note New England combined to one state and DC+MD+DE combined)
3 2000 population within 5 miles of store
6 1990 population within 5 miles of store
9 1980 population within 5 miles of store
5498 x 44 matrix, row i = store i, column j=year 1961+j.
Units annual salary of retail employment in 2005 dollars
5498 x 44 matrix, row i = store i, column j=year 1961+j.
units $1,000 (2005) dollars per acre
Parameters: of Demand Model
Let theta be the parameter vector