Roy Mill

At-Bay Inc.
Mountain View

E-Mail: EmailAddress: hidden: you can email any NBER-related person as first underscore last at nber dot org

NBER Working Papers and Publications

February 2018Linking Individuals Across Historical Sources: a Fully Automated Approach
with Ran Abramitzky, Santiago Pérez: w24324
Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these es...

Published: Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach*," Historical Methods: A Journal of Quantitative and Interdisciplinary History, vol 53(2), pages 94-111.

National Bureau of Economic Research
1050 Massachusetts Ave.
Cambridge, MA 02138

Twitter RSS

View Full Site: One timeAlways