Institutional Affiliation: Duke University
|A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U. S. Government|
with , , , , , , , : w23534
Data stewards seeking to provide access to large-scale social science data face a difficult challenge. They have to share data in ways that protect privacy and confidentiality, are informative for many analyses and purposes, and are relatively straightforward to use by data analysts. We present a framework for addressing this challenge. The framework uses an integrated system that includes fully synthetic data intended for wide access, coupled with means for approved users to access the confidential data via secure remote access solutions, glued together by verification servers that allow users to assess the quality of their analyses with the synthetic data. We apply this framework to data on the careers of employees of the U. S. federal government, studying differentials in pay by race. ...
Published: Barrientos, Andres F., Alexander Bolton, Tom Balmat, Jerome P. Reiter, John M. de Figueiredo, Ashwin Machanavajjhala, Yan Chen, Charley Kneifel, and Mark DeLong (2018). “A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U.S. Government,” Annals of Applied Statistics 12(2): 1124-1156.
|Imputation in U.S. Manufacturing Data and Its Implications for Productivity Dispersion|
with , : w22569
In the U.S. Census Bureau's 2002 and 2007 Censuses of Manufactures 79% and 73% of observations respectively have imputed data for at least one variable used to compute total factor productivity. The Bureau primarily imputes for missing values using mean-imputation methods which can reduce the true underlying variance of the imputed variables. For every variable entering TFP in 2002 and 2007 we show the dispersion is significantly smaller in the Census mean-imputed versus the Census non-imputed data. As an alternative to mean imputation we show how to use classification and regression trees (CART) to allow for a distribution of multiple possible impute values based on other plants that are CART-algorithmically determined to be similar based on other observed variables. For 90% of the 473 in...
Published: T. Kirk White & Jerome P. Reiter & Amil Petrin, 2018. "Imputation in U.S. Manufacturing Data and Its Implications for Productivity Dispersion," The Review of Economics and Statistics, vol 100(3), pages 502-509. citation courtesy of
|Plant-level Productivity and Imputation of Missing Data in U.S. Census Manufacturing Data|
with , : w17816
Within-industry differences in measured plant-level productivity are large. A large literature has been devoted to explaining the causes and consequences of these differences. In the U.S. Census Bureau's manufacturing data, the Bureau imputes for missing values using methods known to result in underestimation of variability and potential bias in multivariate inferences. We present an alternative strategy for handling the missing data based on multiple imputation via sequences of classification and regression trees. We use our imputations and the Bureau's imputations to estimate within-industry productivity dispersions. The results suggest that there is more within-industry productivity dispersion than previous research has indicated. We also estimate relationships between productivity and ...
|The Impact of Plant-level Resource Reallocations and Technical Progress on U.S. Macroeconomic Growth|
with , : w16700
We build up from the plant level an "aggregate(d)" Solow residual by estimating every U.S. manufacturing plant's contribution to the change in aggregate final demand between 1976 and 1996. Our framework uses the Petrin and Levinsohn (2010) definition of aggregate productivity growth, which aggregates plant-level changes to changes in aggregate final demand in the presence of imperfect competition and other distortions/frictions. We decompose these contributions into plant-level resource reallocations and plant-level technical efficiency changes while allowing in the estimation for 459 different production technologies, one for each 4-digit SIC code. On average we find positive aggregate productivity growth of 2.2% in this sector during this period of declining share in U.S. GDP. We find th...
Published: Amil Petrin & Jerome Reiter & Kirk White, 2011. "The Impact of Plant-level Resource Reallocations and Technical Progress on U.S. Macroeconomic Growth," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 14(1), pages 3-26, January. citation courtesy of