Skip to main content

Files containing the unbalanced panel of annual state-level mortality in registration states by cause and by age and sex for years 1900-1936:

Here is a list of anomalies of which I am aware and that appear to be present in the historical mortality volumes (rather than being due to data entry errors):

  • 1902: total deaths summed by cause and total deaths summed by age and sex for all reporting states do not agree
  • 1912 and 1913: deaths by state and age are not disaggregated by gender in the historical volumes
  • 1916: total deaths and the sum of age-specific deaths for females by age in Ohio do not agree
  • 1919: total deaths and the sum of age-specific deaths for males by age in Kentucky do not agree
  • 1920: total deaths and the sum of age-specific deaths for males in California and for females in Indiana do not agree; total deaths and the sums by age and sex for Kentucky do not agree
  • 1924 through 1936: deaths by age are aggregated for ages 1- 4 (for earlier years they are provided by single years of age in this interval)
  • 1935: total deaths by cause and total deaths by age for Massachusetts do not agree

One nice feature of this data is that it is possible to compare totals provided in the historical volumes with sums by cause and by age to detect data entry errors (the data was also double-entered by Digital Divide Data). I've doubled-checked all inconsistencies caught by these comparisons - the ones listed here appear to be present in the printed historical mortality statistics volumes.

I'll need to do a bit more work in documenting everything - the main thing that is not transparent is how I created a few variables. My objective was to make the STATA dataset consistent across years, so under conservative assumptions, I combined a few categories of deaths that are reported differently in different years. For example, in some years, "cancer" and "tumor" deaths are reported separately, while in other years they are reported together as "cancer and tumors." So I created a single variable throughout called "cancer and tumors." The variables which required a little manipulation and a few reasonable assumptions are:

  • tb_lungs
  • other_tb
  • all_tb
  • cancer_tumor
  • accidents_violence
  • unknown_other
  • childbirth_puerperal
  • male_95_over
  • female_95_over
  • male_1_4
  • female_1_4
  • all_1_4

As I mentioned, I'd like to encourage people to use this data, so please feel free to share it with whomever might be interested. In particular, I'd very much like to know if additional errors are found. Once its up on the Berkeley demography web site, I'll let you know.

Grant Miller
ngmiller at stanford edu
23 March 2006


July 2015 Correction and Addition

Pneumonia deaths prior to 1910 were reported separately for bronchopneumonia and for lobar and undefined pneumonia deaths. In 1910 and subsequent years, pneumonia deaths were reported as the sum of all types of pneumonia deaths. The original digitized dataset contained lobar and undefined pneumonia for years prior to 1910, and all-cause pneumonia in 1910 and subsequent years. This updated version now includes a consistent measure of all-cause pneumonia deaths (lobar, bronco and undefined pneumonia) for all years. Additionally, small data entry errors were corrected, the most notable of which was for 1902 (1901 total mortality and cause-specific mortality were previously recorded for both 1901 and 1902).

The original files are available as https://www.nber.org/data/vital-statistics-deaths-historical/archive

Related

Topics

Data Categories

More from NBER

In addition to working papers, the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter, the NBER Digest, the Bulletin on Retirement and Disability, the Bulletin on Health, and the Bulletin on Entrepreneurship — as well as online conference reports, video lectures, and interviews.

15th Annual Feldstein Lecture, Mario Draghi, "The Next Flight of the Bumblebee: The Path to Common Fiscal Policy in the Eurozone cover slide
  • Lecture
Dr. Mario Draghi, who served as President of the European Central Bank and Prime Minister of Italy, presented the 2023...
2023 Methods Lectures, Jesse Shapiro and Liyang (Sophie) Sun, "Linear Panel Event Studies" Primary tabs
  • Lecture
Overview: Linear panel event studies are increasingly used to estimate and plot causal effects of changes in policies...
2023, SI Economics of Social Security, Panel Discussion, "Long-Term Dynamics of the Employment-to-Population Ratio" Primary tabs
  • Lecture
Supported by the Alfred P. Sloan Foundation, the National Science Foundation, and the Lynde and Harry Bradley...