Illinois Data Bank

Author-Linked data for Author-ity 2009

Provides links to Author-ity 2009, including records from principal investigators (on NIH and NSF grants), inventors on USPTO patents, and students/advisors on ProQuest dissertations.

Note that NIH and NSF differ in the type of fields they record and standards used (e.g., institution names). Typically an NSF grant spanning multiple years is associated with one record, while an NIH grant occurs in multiple records, for each fiscal year, sub-projects/supplements, possibly with different principal investigators.

The prior probability of match (i.e., that the author exists in Author-ity 2009) varies dramatically across NIH grants, NSF grants, and USPTO patents. The great majority of NIH principal investigators have one or more papers in PubMed but a minority of NSF principal investigators (except in biology) have papers in PubMed, and even fewer USPTO inventors do. This prior probability has been built into the calculation of match probabilities.

The NIH data were downloaded from NIH exporter and the older NIH CRISP files. The dataset has 2,353,387 records, only includes ones with match probability > 0.5, and has the following 12 fields:
1 app_id,
2 nih_full_proj_nbr,
3 nih_subproj_nbr,
4 fiscal_year
5 pi_position
6 nih_pi_names
7 org_name
8 org_city_name
9 org_bodypolitic_code
10 age: number of years since their first paper
11 prob: the match probability to au_id
12 au_id: Author-ity 2009 author ID

The NSF dataset has 262,452 records, only includes ones with match probability > 0.5, and the following 10 fields:
1 AwardId
2 fiscal_year
3 pi_position,
4 PrincipalInvestigators,
5 Institution,
6 InstitutionCity,
7 InstitutionState,
8 age: number of years since their first paper
9 prob: the match probability to au_id
10 au_id: Author-ity 2009 author ID

There are two files for USPTO because here we linked disambiguated authors in PubMed (from Author-ity 2009) with disambiguated inventors.

The USPTO linking dataset has 309,720 records, only includes ones with match probability > 0.5, and the following 3 fields
1 au_id: Author-ity 2009 author ID
2 inv_id: USPTO inventor ID
3 prob: the match probability of au_id vs inv_id

The disambiguated inventors file (uiuc_uspto.tsv) has 2,736,306 records, and has the following 7 fields
1 inv_id: USPTO inventor ID
2 is_lower
3 is_upper
4 fullnames
5 patents: patent IDs separated by '|'
6 first_app_yr
7 last_app_yr

Social Sciences
PubMed; USPTO; Principal investigator; Name disambiguation
CC BY
U.S. National Science Foundation (NSF)-Grant:0965341
U.S. National Science Foundation (NSF)-Grant:1348742
U.S. National Institutes of Health (NIH)-Grant:P01AG039347
Vetle I. Torvik
1796 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-4370459_V1 2018-04-23

246 MB File
31.1 MB File
10.9 MB File
188 MB File

Contact the Research Data Service for help interpreting this log.

RelatedMaterial create: {"material_type"=>"Conference paper", "availability"=>nil, "link"=>"https://conference.druid.dk/acc_papers/jqtc1bira0rs6zecgsor0gfvpzwtyi.pdf", "uri"=>"https://conference.druid.dk/acc_papers/jqtc1bira0rs6zecgsor0gfvpzwtyi.pdf", "uri_type"=>"URL", "citation"=>" Schaper, Thomas; Sam Arts; and Reinhilde Veugelers. (2022). Not Like the Others: Frontier Scientists and High-Impact Inventions. Paper to be presented at DRUID22 Copenhagen Business School, Copenhagen, Denmark June 13-15, 2022\r\n", "dataset_id"=>538, "selected_type"=>"Other", "datacite_list"=>""} 2022-05-02T15:49:09Z
RelatedMaterial create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-4222651_V1", "uri"=>"10.13012/B2IDB-4222651_V1", "uri_type"=>"DOI", "citation"=>"Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1", "dataset_id"=>538, "selected_type"=>"Dataset", "datacite_list"=>"IsSupplementTo"} 2018-04-27T16:13:34Z
Dataset update: {"keywords"=>["", "PubMed; USPTO; Principal investigator; Name disambiguation"], "version_comment"=>[nil, ""], "subject"=>["", "Social Sciences"]} 2018-04-27T16:13:34Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us