Author-ity 2018 - PubMed author name disambiguated dataset
Dataset Description |
Author-ity 2018 dataset Prepared by Vetle Torvik Apr. 22, 2021 The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018). A total of 29.1 million Article records and 114.2 million author name instances. Each instance of an author name is uniquely represented by the PMID and the position on the paper (e.g., 10786286_3 is the third author name on PMID 10786286). Thus, each cluster is represented by a collection of author name instances. The instances were first grouped into "blocks" by last name and first name initial (including some close variants), and then each block was separately subjected to clustering. The resulting clusters are provided in two different formats, the first in a file with only IDs and PMIDs, and the second in a file with cluster summaries: ####################
########################
|
Subject |
Social Sciences |
Keywords |
author name disambiguation; PubMed |
License |
CC BY |
Funder |
U.S. National Institutes of Health (NIH)-Grant:P01AG039347 |
Corresponding Creator |
Vetle Torvik |
Downloaded |
768 times |
| Version | DOI | Comment | Publication Date |
|---|---|---|---|
| 1 | 10.13012/B2IDB-2273402_V1 | 2021-04-22 |
Contact the Research Data Service for help interpreting this log.
| RelatedMaterial | create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.1371/journal.pone.0316890", "uri"=>"10.1371/journal.pone.0316890", "uri_type"=>"DOI", "citation"=>"Mishra A, Lee H, Jeoung S, Torvik VI, Diesner J (2025) Patterns of diversity in biomedical coauthorships: An analysis across authors’ ethnicity, gender, age, and expertise. PLOS ONE 20(1): e0316890. https://doi.org/10.1371/journal.pone.0316890", "dataset_id"=>1848, "selected_type"=>"Article", "datacite_list"=>"IsSupplementTo", "note"=>nil, "feature"=>nil} | 2025-02-04T16:38:06Z |
| RelatedMaterial | update: {"note"=>[nil, ""]} | 2025-02-04T16:38:06Z |
| RelatedMaterial | create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.1093/bioinformatics/btae672", "uri"=>"10.1093/bioinformatics/btae672", "uri_type"=>"DOI", "citation"=>"Tian, Shubo , Qingyu Chen, Donald C Comeau, W John Wilbur, and Zhiyong Lu. 2024. PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature. Bioinformatics, 2024; btae672. https://doi.org/10.1093/bioinformatics/btae672", "dataset_id"=>1848, "selected_type"=>"Article", "datacite_list"=>"IsCitedBy", "note"=>nil, "feature"=>nil} | 2024-11-12T16:39:20Z |
| Dataset | update: {"publisher"=>["University of Illinois at Urbana-Champaign", "University of Illinois Urbana-Champaign"]} | 2024-11-12T16:39:20Z |
| RelatedMaterial | create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.48550/arXiv.2410.07969", "uri"=>"10.48550/arXiv.2410.07969", "uri_type"=>"DOI", "citation"=>"Xu, J., Yu, C., Xu, J., Ding, Y., Torvik, V.I., Kang, J., Sung, M., & Song, M. (2024). PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science. doi: https://doi.org/10.48550/arXiv.2410.07969", "dataset_id"=>1848, "selected_type"=>"Article", "datacite_list"=>"IsSupplementTo", "note"=>"", "feature"=>nil} | 2024-10-23T16:10:48Z |