A newer version of this dataset is available.
View the latest version.
Version | DOI | Comment | Publication Date |
---|---|---|---|
3 | 10.13012/B2IDB-5259667_V3 | updated data formulation | 2024-10-10 |
2 | 10.13012/B2IDB-5259667_V2 | expanded dataset | 2024-08-19 |
1 | 10.13012/B2IDB-5259667_V1 | 2024-03-25 |
Contact the Research Data Service for help interpreting this log.
RelatedMaterial | create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-5259667_V3", "uri"=>"10.13012/B2IDB-5259667_V3", "uri_type"=>"DOI", "citation"=>" (2024): Diversity - PubMed Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5259667_V3", "dataset_id"=>2768, "selected_type"=>"Dataset", "datacite_list"=>"IsPreviousVersionOf", "note"=>nil, "feature"=>nil} | 2024-10-04T03:58:40Z |
RelatedMaterial | update: {"note"=>[nil, ""]} | 2024-08-19T19:12:28Z |
Dataset | update: {"version_comment"=>["A new updated dataset needs to be uploaded.", "expanded dataset"]} | 2024-08-19T19:12:28Z |
Dataset | update: {"publication_state"=>["version candidate under curator review", "released"], "release_date"=>[nil, Mon, 19 Aug 2024]} | 2024-08-19T18:28:35Z |
Dataset | update: {"description"=>["Diversity - PubMed dataset\r\nContact: Apratim Mishra (Aug, 2024)\r\n\r\nThis dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 907 024 papers, and 1612 118 authors. The sample of articles is based on the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014 inclusive. Files are 'gzip' compressed and separated by tab space.\r\n################################################\r\nFile1: auids_plos_2.gz.csv (Important columns defined, 7 in total)\r\n•\tAUID: a unique ID for each author\r\n•\tEthnea: ethnicity prediction\r\n•\tGenni: gender prediction\r\n#################################################\r\nFile2: pmids_plos_2.gz.csv (Important columns defined)\r\n•\tpmid: unique paper \r\n•\t auid: all unique auids\r\n•\tyear: Year of paper publication\r\n•\tno_authors: Author count\r\n•\tjournal: Journal name\r\n•\tyears: first year of publication for every author\r\n•\tage_bin: Binned age for every author\r\n•\tCountry-temporal: Country of affiliation for every author\r\n•\th_index: Journal h-index\r\n•\tTimeNovelty: Paper Time novelty [2]\r\n•\tnih_funded: Binary variable indicating funding for any author\r\n•\tprior_cit_mean: Mean of all authors’ prior citation rate\r\n•\tInsti_impact: All unique institutions’ citation rate\r\n•\tmesh_vals: Top MeSH values for every author of that paper\r\n•\trelative_citation_ratio: RCR\r\n\r\nThe ‘Readme’ includes a description for all columns.\r\n[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1\r\n[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1\r\n", "Diversity - PubMed dataset\r\nContact: Apratim Mishra (Aug, 2024)\r\n\r\nThis dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 907 024 papers, and 1612 118 authors. The sample of articles is based on the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014 inclusive. Files are 'gzip' compressed and separated by tab space.\r\n################################################\r\nFile1: auids_plos_2.csv.gz (Important columns defined, 7 in total)\r\n•\tAUID: a unique ID for each author\r\n•\tEthnea: ethnicity prediction\r\n•\tGenni: gender prediction\r\n#################################################\r\nFile2: pmids_plos_2.csv.gz (Important columns defined)\r\n•\tpmid: unique paper \r\n•\tauid: all unique auids\r\n•\tyear: Year of paper publication\r\n•\tno_authors: Author count\r\n•\tjournal: Journal name\r\n•\tyears: first year of publication for every author\r\n•\tage_bin: Binned age for every author\r\n•\tCountry-temporal: Country of affiliation for every author\r\n•\th_index: Journal h-index\r\n•\tTimeNovelty: Paper Time novelty [2]\r\n•\tnih_funded: Binary variable indicating funding for any author\r\n•\tprior_cit_mean: Mean of all authors’ prior citation rate\r\n•\tInsti_impact: All unique institutions’ citation rate\r\n•\tmesh_vals: Top MeSH values for every author of that paper\r\n•\trelative_citation_ratio: RCR\r\n\r\nThe ‘Readme’ includes a description for all columns.\r\n[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1\r\n[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1\r\n"]} | 2024-08-16T17:18:04Z |
Dataset | update: {"description"=>["Diversity - PubMed dataset\r\nContact: Apratim Mishra (March 22, 2024)\r\n\r\nThis dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 228 040 papers and 440 310 authors. The sample of papers is based on the top 40 journals in the dataset, limited to 2-10 authors published between 1990 – 2010, and stratified on paper count per year. Additionally, this dataset is limited to papers where the lead author is affiliated with one of the four countries: the US, the UK, Canada, and Australia. Files are encoded with ‘utf-8’.\r\n################################################\r\nFile1: auids_plos.csv (Important columns defined, 7 in total)\r\n•\tAUID: a unique ID for each author\r\n•\tEthnea: ethnicity prediction\r\n•\tGenni: gender prediction\r\n#################################################\r\nFile2: pmids_plos.csv (Important columns defined, 33 in total)\r\n•\tpmid: unique paper ID\r\n•\tyear: Year of paper publication\r\n•\tno_authors: Author count\r\n•\tjournal: Journal name\r\n•\tyears: first year of publication for every author\r\n•\tage_bin: Binned age for every author\r\n•\tCountry-temporal: Country of affiliation for every author\r\n•\th_index: Journal h-index\r\n•\tTimeNovelty: Paper Time novelty [2]\r\n•\tnih_funded: Binary variable indicating NIH funding for any author\r\n•\tprior_cit_mean: Mean of all authors’ prior citation rate\r\n•\tInsti_impact_all: All authors’ respective institutions’ citation count\r\n•\tInsti_impact: Maximum of all institutions’ citation count\r\n•\tmesh_vals: Top MeSH values for every author for that paper\r\n•\touter_mesh_vals: MeSH qualifiers for every author for that paper\r\n•\trelative_citation_ratio: RCR\r\n\r\nThe ‘Readme’ includes a description for all columns.\r\n[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1\r\n[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1\r\n", "Diversity - PubMed dataset\r\nContact: Apratim Mishra (Aug, 2024)\r\n\r\nThis dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 907 024 papers, and 1612 118 authors. The sample of articles is based on the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014 inclusive. Files are 'gzip' compressed and separated by tab space.\r\n################################################\r\nFile1: auids_plos_2.gz.csv (Important columns defined, 7 in total)\r\n•\tAUID: a unique ID for each author\r\n•\tEthnea: ethnicity prediction\r\n•\tGenni: gender prediction\r\n#################################################\r\nFile2: pmids_plos_2.gz.csv (Important columns defined)\r\n•\tpmid: unique paper \r\n•\t auid: all unique auids\r\n•\tyear: Year of paper publication\r\n•\tno_authors: Author count\r\n•\tjournal: Journal name\r\n•\tyears: first year of publication for every author\r\n•\tage_bin: Binned age for every author\r\n•\tCountry-temporal: Country of affiliation for every author\r\n•\th_index: Journal h-index\r\n•\tTimeNovelty: Paper Time novelty [2]\r\n•\tnih_funded: Binary variable indicating funding for any author\r\n•\tprior_cit_mean: Mean of all authors’ prior citation rate\r\n•\tInsti_impact: All unique institutions’ citation rate\r\n•\tmesh_vals: Top MeSH values for every author of that paper\r\n•\trelative_citation_ratio: RCR\r\n\r\nThe ‘Readme’ includes a description for all columns.\r\n[1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1\r\n[2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1\r\n"]} | 2024-08-16T17:01:27Z |
Dataset | update: {"hold_state"=>["version candidate under curator review", "none"]} | 2024-08-16T16:08:40Z |
Dataset | update: {"version_comment"=>[nil, "A new updated dataset needs to be uploaded."]} | 2024-08-16T14:43:37Z |
RelatedMaterial | create: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-5259667_V1", "uri"=>"10.13012/B2IDB-5259667_V1", "uri_type"=>"DOI", "citation"=>"Mishra, Apratim; Lee, Haejin; Jeoung, Sullam; Torvik, Vetle; Diesner, Jana (2024): Diversity - PubMed Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5259667_V1", "dataset_id"=>2768, "selected_type"=>"Dataset", "datacite_list"=>"IsNewVersionOf", "note"=>nil, "feature"=>nil} | 2024-08-16T14:43:01Z |
Creator | create: {"family_name"=>"Diesner", "given_name"=>"Jana", "identifier"=>"0000-0001-8183-7109", "email"=>"jdiesner@illinois.edu", "is_contact"=>false, "row_position"=>5} | 2024-08-16T14:43:01Z |
Creator | create: {"family_name"=>"Torvik", "given_name"=>"Vetle", "identifier"=>"0000-0002-0035-1850", "email"=>"vtorvik@illinois.edu", "is_contact"=>false, "row_position"=>4} | 2024-08-16T14:43:00Z |
Creator | create: {"family_name"=>"Jeoung", "given_name"=>"Sullam", "identifier"=>"0009-0008-8403-5441", "email"=>"sjeoung2@illinois.edu", "is_contact"=>false, "row_position"=>3} | 2024-08-16T14:43:00Z |
Creator | create: {"family_name"=>"Lee", "given_name"=>"Haejin", "identifier"=>"0009-0000-0260-0462", "email"=>"haejin2@illinois.edu", "is_contact"=>false, "row_position"=>2} | 2024-08-16T14:43:00Z |
Creator | create: {"family_name"=>"Mishra", "given_name"=>"Apratim", "identifier"=>"0000-0002-2946-308X", "email"=>"apratim3@illinois.edu", "is_contact"=>true, "row_position"=>1} | 2024-08-16T14:43:00Z |
Dataset | update: {"corresponding_creator_name"=>[nil, "Apratim Mishra"], "corresponding_creator_email"=>[nil, "apratim3@illinois.edu"]} | 2024-08-16T14:43:00Z |