NPI Data -- CMS' National Plan and Provider Enumeration System (NPPES) Files in SAS, Stata, and CSV format

Centers for Medicare & Medicaid Services CMS has developed the National Plan and Provider Enumeration System (NPPES) to assign unique identifiers to health care providers. The National Provider Indentifier (NPI) has been the standard identifier for all HIPAA-covered entities (health care providers) since May 23, 2007. Small health plans were required to obtain and use an NPI by May 23, 2008.

These files contain all of the FOIA-disclosable data for active and deactivated providers in NPPES.

The source NPPES data dissemination is avaiable from NPPES.

Older, such as pre-2011, copies of the NPPES data dissemintation gratefully accepted. Please contact .

Jean Roth set up the processed versions of the source NPPES data dissemination to make the files easier to use. The SAS and Stata files are about 32 Gb. CMS releases the source NPPES data in a limited number of formats in order to keep costs low. The first line of the source file contains variable descriptions. These can be difficult for statistical packages to read because many stat packages have a 32-character maximum variable length. The source file has long descriptions with numbers at the end, so, "my_long_variable_description_number_1" to "my_long_variable_description_number_50" can all get cut off to "my_long_variable_description_num" .

In the SAS and Stata datasets, the header information is preserved as the variable label and variable names with less than 32 characters which preserve the sequence number, if applicable, are assigned.

An NPI to UPIN crosswalk and an NPI to state license crosswalk made from these files are also available.

The source data file has nearly 5 million records. Excel 2010 supports a maximum of 1,048,576 rows, so it cannot be used to read in the whole source file at once. Some instructions on reading the file into Access and selecting variables and rows are available.

The main NPI data file and core data file includes ZIP Codes. A ZIP Code distance database is also available.

The overall last update date is at the bottom of the page, and the desc files each have dates as well.

Updates and changes.

( ~32 Gb )
( ~32 Gb )
( ~3 Gb )
( ~5 Gb )
  Code Values  
  Application Form & Instructions  
  SAS   Stata   CSV   Source   Documentation   Code Values

The NPI data above has been reshaped into a database style below so that non-repeated fields are in the core file, and repeated files are in their own long, skinny files. The files can be linked by the NPI field. The core file is less than 1/5 the size of the NPI database above, so it can be easier to work with.

  Core, Non-Repeated Variables   core (5Gb)   core (5Gb)   core (<2Gb)   desc
  Healthcare Provider Taxonomy Code Variables   ptax (.4Gb)   ptax (.4Gb)   ptax (.2Gb)   desc
  Provider License Variables   plic (.2Gb)   plic (.2Gb)   plic (.2Gb)   desc
  Other Provider Identifier Variables**
  NPI to Medicare Crosswalk
  NPI to Medicaid Crosswalk
  othp (.6Gb)   othp (.6Gb)   othp (.2Gb)   desc
** One limitation of these crosswalks is that the provider had to include the other provider identifier in their NPI application in order for that provider identifier to appear in the NPPES database.

The data is also available in two-variable files of NPI + one other database variable pairs for greater ease of use.

Contact data@nber.org with questions, comments, or suggestions.

Last Update Created by Jean Roth February 2, 2012

National Bureau of Economic Research
1050 Massachusetts Ave.
Cambridge, MA 02138

Twitter RSS

View Full Site: One timeAlways