|Related Dataset||Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1|
Provides links to Author-ity 2009, including records from principal investigators (on NIH and NSF grants), inventors on USPTO patents, and students/advisors on ProQuest dissertations.
Note that NIH and NSF differ in the type of fields they record and standards used (e.g., institution names). Typically an NSF grant spanning multiple years is associated with one record, while an NIH grant occurs in multiple records, for each fiscal year, sub-projects/supplements, possibly with different principal investigators.
The prior probability of match (i.e., that the author exists in Author-ity 2009) varies dramatically across NIH grants, NSF grants, and USPTO patents. The great majority of NIH principal investigators have one or more papers in PubMed but a minority of NSF principal investigators (except in biology) have papers in PubMed, and even fewer USPTO inventors do. This prior probability has been built into the calculation of match probabilities.
The NIH data were downloaded from NIH exporter and the older NIH CRISP files. The dataset has 2,353,387 records, only includes ones with match probability > 0.5, and has the following 12 fields:
The NSF dataset has 262,452 records, only includes ones with match probability > 0.5, and the following 10 fields:
There are two files for USPTO because here we linked disambiguated authors in PubMed (from Author-ity 2009) with disambiguated inventors.
The USPTO linking dataset has 309,720 records, only includes ones with match probability > 0.5, and the following 3 fields
The disambiguated inventors file (uiuc_uspto.tsv) has 2,736,306 records, and has the following 7 fields
|Keywords||PubMed; USPTO; Principal investigator; Name disambiguation|
|Funder||U.S. National Science Foundation (NSF) - Grant: 0965341|
|Funder||U.S. National Science Foundation (NSF) - Grant: 1348742|
|Funder||U.S. National Institutes of Health (NIH) - Grant: P01AG039347|
|Corresponding Creator||Vetle I. Torvik|