Displaying 1 - 25 of 136 in total
Subject Area
Funder
Publication Year
License

Datasets

published: 2024-08-19
 
Diversity - PubMed dataset Contact: Apratim Mishra (Aug, 2024) This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 907 024 papers, and 1612 118 authors. The sample of articles is based on the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014 inclusive. Files are 'gzip' compressed and separated by tab space. ################################################ File1: auids_plos_2.csv.gz (Important columns defined, 7 in total) • AUID: a unique ID for each author • Ethnea: ethnicity prediction • Genni: gender prediction ################################################# File2: pmids_plos_2.csv.gz (Important columns defined) • pmid: unique paper • auid: all unique auids • year: Year of paper publication • no_authors: Author count • journal: Journal name • years: first year of publication for every author • age_bin: Binned age for every author • Country-temporal: Country of affiliation for every author • h_index: Journal h-index • TimeNovelty: Paper Time novelty [2] • nih_funded: Binary variable indicating funding for any author • prior_cit_mean: Mean of all authors’ prior citation rate • Insti_impact: All unique institutions’ citation rate • mesh_vals: Top MeSH values for every author of that paper • relative_citation_ratio: RCR The ‘Readme’ includes a description for all columns. [1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1 [2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords: Diversity; PubMed; Citation
published: 2022-07-25
 
A set of chemical entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; chemical mentions
published: 2022-07-25
 
Related to the raw entity mentions (https://doi.org/10.13012/B2IDB-4163883_V1), this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the ChEBI ontology.
keywords: synthetic biology; NERC data; chemical mentions; ambiguous entities
published: 2020-02-23
 
Citation context annotation for papers citing retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). This is part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. "Continued Citation of a Fraudulent Clinical Trial Report, Eleven Years after it was retracted for Falsifying Data" [R&R under review with Scientometrics]. Overall we found 148 citations to the retracted paper from 2006 to 2019, However, this dataset does not include the annotations described in the 2015. in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomized controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26. In this dataset 70 new and newly found citations are listed: 66 annotated citations and 4 pending citations (non-annotated since we don't have full-text). "New citations" refer to articles published from March 25, 2014 to 2019, found in Google Scholar and Web of Science. "Newly found citations" refer articles published 2006-2013, found in Google Scholar and Web of Science, but not previously covered in Ashley Fulton, Alison Coates, Marie Williams, Peter Howe, and Alison Hill. "Persistent citation of the only published randomised controlled trial of omega-3 supplementation in chronic obstructive pulmonary disease six years after its retraction." Publications 3, no. 1 (2015): 17-26. NOTES: This is Unicode data. Some publication titles & quotes are in non-Latin characters and they may contain commas, quotation marks, etc. FILES/FILE FORMATS Same data in two formats: 2006-2019-new-citation-contexts-to-Matsuyama.csv - Unicode CSV (preservation format only) 2006-2019-new-citation-contexts-to-Matsuyama.xlsx - Excel workbook (preferred format) ROW EXPLANATIONS 70 rows of data - one citing publication per row COLUMN HEADER EXPLANATIONS Note - processing notes Annotation pending - Y or blank Year Published - publication year ID - ID corresponding to the network analysis. See Ye, Di; Schneider, Jodi (2019): Network of First and Second-generation citations to Matsuyama 2005 from Google Scholar and Web of Science. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-1403534_V2">https://doi.org/10.13012/B2IDB-1403534_V2</a> Title - item title (some have non-Latin characters, commas, etc.) Official Translated Title - item title in English, as listed in the publication Machine Translated Title - item title in English, translated by Google Scholar Language - publication language Type - publication type (e.g., bachelor's thesis, blog post, book chapter, clinical guidelines, Cochrane Review, consumer-oriented evidence summary, continuing education journal article, journal article, letter to the editor, magazine article, Master's thesis, patent, Ph.D. thesis, textbook chapter, training module) Book title for book chapters - Only for a book chapter - the book title University for theses - for bachelor's thesis, Master's thesis, Ph.D. thesis - the associated university Pre/Post Retraction - "Pre" for 2006-2008 (means published before the October 2008 retraction notice or in the 2 months afterwards); "Post" for 2009-2019 (considered post-retraction for our analysis) Identifier where relevant - ISBN, Patent ID, PMID (only for items we considered hard to find/identify, e.g. those without a DOI-based URL) URL where available - URL, ideally a DOI-based URL Reference number/style - reference Only in bibliography - Y or blank Acknowledged - If annotated, Y, Not relevant as retraction not published yet, or N (blank otherwise) Positive / "Poor Research" (Negative) - P for positive, N for negative if annotated; blank otherwise Human translated quotations - Y or blank; blank means Google scholar was used to translate quotations for Translated Quotation X Specific/in passing (overall) - Specific if any of the 5 quotations are specific [aggregates Specific / In Passing (Quotation X)] Quotation 1 - First quotation (or blank) (includes non-Latin characters in some cases) Translated Quotation 1 - English translation of "Quotation 1" (or blank) Specific / In Passing (Quotation 1) - Specific if "Quotation 1" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 1) - Methods; Results; or Methods and Results - blank if "Quotation 1" not specific, no associated quotation, or not yet annotated Quotation 2 - Second quotation (includes non-Latin characters in some cases) Translated Quotation 2 - English translation of "Quotation 2" Specific / In Passing (Quotation 2) - Specific if "Quotation 2" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 2) - Methods; Results; or Methods and Results - blank if "Quotation 2" not specific, no associated quotation, or not yet annotated Quotation 3 - Third quotation (includes non-Latin characters in some cases) Translated Quotation 3 - English translation of "Quotation 3" Specific / In Passing (Quotation 3) - Specific if "Quotation 3" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 3) - Methods; Results; or Methods and Results - blank if "Quotation 3" not specific, no associated quotation, or not yet annotated Quotation 4 - Fourth quotation (includes non-Latin characters in some cases) Translated Quotation 4 - English translation of "Quotation 4" Specific / In Passing (Quotation 4) - Specific if "Quotation 4" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 4) - Methods; Results; or Methods and Results - blank if "Quotation 4" not specific, no associated quotation, or not yet annotated Quotation 5 - Fifth quotation (includes non-Latin characters in some cases) Translated Quotation 5 - English translation of "Quotation 5" Specific / In Passing (Quotation 5) - Specific if "Quotation 5" refers to methods or results of the Matsuyama paper (or blank) What is referenced from Matsuyama (Quotation 5) - Methods; Results; or Methods and Results - blank if "Quotation 5" not specific, no associated quotation, or not yet annotated Further Notes - additional notes
keywords: citation context annotation, retraction, diffusion of retraction
published: 2021-07-22
 
This dataset includes five files. Descriptions of the files are given as follows: <b>FILENAME: PubMed_retracted_publication_full_v3.tsv</b> - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. - PMIDs in the "cited_by" column that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) <b>FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv</b> - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. - Citation contexts that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = tables and table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. <b>FILENAME: 724_knowingly_post_retraction_cit.csv</b> (updated) - The 724 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". - Two citation contexts from retraction notices have been excluded from analyses. ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, table_or_figure_caption = tables and table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. <b>FILENAME: Annotation manual.pdf</b> - The manual for annotating the citation purposes in column 11) of the 724_knowingly_post_retraction_cit.tsv. <b>FILENAME: retraction_notice_PMID.csv</b> (new file added for this version) - A list of 8,346 PMIDs of retraction notices indexed in PubMed (retrieved on August 20, 2020, searched with the query "retraction of publication" [PT] ).
keywords: citation context; in-text citation; citation to retracted papers; retraction
published: 2024-05-24
 
This dataset consists the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for (Willoughby et al., 2014): Willoughby, Patrick H., Jansma, Matthew J., & Hoye, Thomas R. (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042 We added the DOIs of the citing publications into a Zotero collection, which we exported into a .csv file and an .rtf file. Willoughby2014_286citing_publications.csv is a Zotero data export of the citing publications. Willoughby2014_286citing_publications.rtf is a bibliography of the citing publications, using a variation of American Psychological Association style (7th edition) with full names instead of initials. We developed an automation system to analyze unreliability propagation through the publications citing an unreliable publication: Willoughby et al., 2014 (one of the Python scripts that supported the protocol presented in this publication has a code glitch). We call a publication "unreliable by propagation" when its main findings have become unreliable by citing an unreliable source. The system triaged the citing publications that are in English (284) according to whether they are at risk because of citing Willoughby et al., 2014. We excluded 2 publications that are not in English, their DOIs are 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201. We compared the accuracy of the system's triage with a separate manual analysis the chemistry expert (YF) conducted on the 284 citing publications. 284_merged_decision_and_annotation.csv (new in this V2) shows the system triage results and the results of a chemistry domain expert (YF)'s manual analysis on the 284 citing publications.
keywords: scientific publications; arguments; citation contexts; defeasible reasoning; Zotero; Web of Science; Scopus; unreliable cited sources; automation systems; knowledge maintenance
published: 2024-03-21
 
Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. This is a repository to save datasets and codes related to this project. Please read and cite the following paper if you would like to use the data: Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING). This folder contains the following files: evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data incl_translation_mobility.csv: Annotated German passages (Mobility) - training data ttparagraph_addmob.txt: German corpus (unannotated passages) model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained rf_model.joblib: The random forest model we trained to extract impact-relevant passages Data processing codes can be found at: https://github.com/khan1792/texttransfer
keywords: impact detection; project reports; annotation; mixed-methods; machine learning
published: 2019-06-13
 
This lexicon is the expanded/enhanced version of the Moral Foundation Dictionary created by Graham and colleagues (Graham et al., 2013). Our Enhanced Morality Lexicon (EML) contains a list of 4,636 morality related words. This lexicon was used in the following paper - please cite this paper if you use this resource in your work. Rezapour, R., Shah, S., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN. In addition, please consider citing the original MFD paper: <a href="https://doi.org/10.1016/B978-0-12-407236-7.00002-4">Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (Vol. 47, pp. 55-130)</a>.
keywords: lexicon; morality
published: 2024-05-07
 
This dataset builds on an existing dataset which captures artists’ demographics who are represented by top tier galleries in the 2016–2017 New York art season (Case-Leal, 2017, https://web.archive.org/web/20170617002654/http://www.havenforthedispossessed.org/) with a census of reviews and catalogs about those exhibitions to assess proportionality of media coverage across race and gender. The readme file explains variables, collection, relationship between the datasets, and an example of how the Case-Leal dataset was transformed. The ArticleDataset.csv provides all articles with citation information as well as artist, artistic identity characteristic, and gallery. The ExhibitionCatalog.csv provides exhibition catalog citation information for each identified artist.
keywords: diversity and inclusion; diversity audit; contemporary art; art exhibitions; art exhibition reviews; exhibition catalogs; magazines; newspapers; demographics
published: 2019-02-19
 
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
keywords: databases; research infrastructure; sustainability; data sharing; molecular biology; bioinformatics; bibliometrics
published: 2019-05-31
 
The data are provided to illustrate methods in evaluating systematic transactional data reuse in machine learning. A library account-based recommender system was developed using machine learning processing over transactional data of 383,828 transactions (or check-outs) sourced from a large multi-unit research library. The machine learning process utilized the FP-growth algorithm over the subject metadata associated with physical items that were checked-out together in the library. The purpose of this research is to evaluate the results of systematic transactional data reuse in machine learning. The analysis herein contains a large-scale network visualization of 180,441 subject association rules and corresponding node metrics.
keywords: evaluating machine learning; network science; FP-growth; WEKA; Gephi; personalization; recommender systems
published: 2018-09-06
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 to present, with awards continuing through the end of the second XSEDE award in 2021. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2024-02-27
 
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf</i> - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised February 2024</i> 2. <i>Coup Data v2.1.3.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. <i>Revised February 2024</i> 3. <i>Source Document v2.1.3.pdf</i> - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised February 2024</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised February 2024</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
published: 2016-06-06
 
These datasets represent first-time collaborations between first and last authors (with mutually exclusive publication histories) on papers with 2 to 5 authors in years [1988,2009] in PubMed. Each record of each dataset captures aspects of the similarity, nearness, and complementarity between two authors about the paper marking the formation of their collaboration.
published: 2023-02-23
 
Coups d'État are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d'État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e. realized or successful coups, unrealized coup attempts, or thwarted conspiracies) the type of actor(s) who initiated the coup (i.e. military, rebels, etc.), as well as the fate of the deposed leader. This current version, Version 2.1.2, adds 6 additional coup events that occurred in 2022 and updates the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrects a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixes this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removes two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and adds executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Changes from the previously released data (v2.0.0) also include: 1. Adding additional events and expanding the period covered to 1945-2022 2. Filling in missing actor information 3. Filling in missing information on the outcomes for the incumbent executive 4. Dropping events that were incorrectly coded as coup events <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.1.2 Codebook.pdf</i> - This 16-page document provides a description of the Cline Center Coup d’État Project Dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d’État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised February 2023</i> 2. <i>Coup Data v2.1.2.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 981 observations. <i>Revised February 2023</i> 3. <i>Source Document v2.1.2.pdf</i> - This 315-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised February 2023</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised February 2023</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2023. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2023. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6
published: 2020-02-12
 
This dataset contains the results of a three month audit of housing advertisements. It accompanies the 2020 ICWSM paper "Auditing Race and Gender Discrimination in Online Housing Markets". It covers data collected between Dec 7, 2018 and March 19, 2019. There are two json files in the dataset: The first contains a list of json objects representing advertisements separated by newlines. Each object includes the date and time it was collected, the image and title (if collected) of the ad, the page on which it was displayed, and the training treatment it received. The second file is a list of json objects representing a visit to a housing lister separated by newlines. Each object contains the url, training treatment applied, the location searched, and the metadata of the top sites scraped. This metadata includes location, price, and number of rooms. The dataset also includes the raw images of ads collected in order to code them by interest and targeting. These were captured by selenium and named using a perceptive hash to de-duplicate images.
keywords: algorithmic audit; advertisement audit;
published: 2018-12-20
 
File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published: 2020-07-16
 
Dataset to be for SocialMediaIE tutorial
keywords: social media; deep learning; natural language processing
published: 2021-11-05
 
This data set contains survey results from a 2021 survey of University of Illinois University Library employees conducted as part of the Becoming A Trans Inclusive Library Project to evaluate the awareness of University of Illinois faculty, staff, and student employees regarding transgender identities, and to assess the professional development needs of library employees to better serve trans and gender non-conforming patrons. The survey instrument is available in the IDEALS repository: http://hdl.handle.net/2142/110080.
keywords: transgender awareness, academic library, gender identity awareness, professional development opportunities
published: 2016-12-19
 
Files in this dataset represent an investigation into use of the Library mobile app Minrva during the months of May 2015 through December 2015. During this time interval 45,975 API hits were recorded by the Minrva web server. The dataset included herein is an analysis of the following: 1) a delineation of API hits to mobile app modules use in the Minrva app by month, 2) a general analysis of Minrva app downloads to module use, and 3) the annotated data file providing associations from API hits to specific modules used, organized by month (May 2015 – December 2015).
keywords: API analysis; log analysis; Minrva Mobile App
published: 2023-03-28
 
Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. <b>Files</b>: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE.  • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs.  • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable.  • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component.  • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-".  • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence.  • progression: Text progression of the citation context/sentence.  <b>Supplementary Files</b> • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • intxt_cit_license_fromPMC.tsv – This file contains the CC licensing information for each article. The licensing information is from PMC's file lists [2], retrieved on June 19, 2020, and March 9, 2023. It should be noted that the license information for 189,855 PMCIDs is <b>NO-CC CODE</b> in the file lists, and 521 PMCIDs are absent in the file lists. The absence of CC licensing information does not indicate that the article lacks a CC license. For example, PMCID: 6156294 (<b>NO-CC CODE</b>) and PMCID: 6118074 (absent in the PMC's file lists) are under CC-BY licenses according to their PDF versions of articles. The intxt_cit_license_fromPMC.tsv file has two columns: • pmcid: PMCID of the article. • license: The article’s CC license information provided in PMC’s file lists. The value is nan when an article is not present in the PMC’s file lists. [2] https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ • Supplementary_File_1.zip – This file contains the code for generating the dataset.
keywords: citation context; in-text citation; inline citation; bibliometrics; science of science
published: 2023-04-12
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation. Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2023-08-02
 
This dataset was developed as part of an online survey study that investigates how phatic expressions—comments that are social rather than informative in nature—influence the perceived helpfulness of online peer help-giving replies in an asynchronous college course discussion forum. During the study, undergraduate students (N = 320) rated and described the helpfulness of examples of replies to online requests for help, both with and without four types of phatic expressions: greeting/parting tokens, other-oriented comments, self-oriented comments, and neutral comments.
keywords: help-giving; phatic expression; discussion forum; online learning; engagement
published: 2023-09-21
 
The relationship between physical activity and mental health, especially depression, is one of the most studied topics in the field of exercise science and kinesiology. Although there is strong consensus that regular physical activity improves mental health and reduces depressive symptoms, some debate the mechanisms involved in this relationship as well as the limitations and definitions used in such studies. Meta-analyses and systematic reviews continue to examine the strength of the association between physical activity and depressive symptoms for the purpose of improving exercise prescription as treatment or combined treatment for depression. This dataset covers 27 review articles (either systematic review, meta-analysis, or both) and 365 primary study articles addressing the relationship between physical activity and depressive symptoms. Primary study articles are manually extracted from the review articles. We used a custom-made workflow (Fu, Yuanxi. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> that uses the Scopus API and manual work to extract and disambiguate authorship information for the 392 reports. The author information file (author_list.csv) is the product of this workflow and can be used to compute the co-author network of the 392 articles. This dataset can be used to construct the inclusion network and the co-author network of the 27 review articles and 365 primary study articles. A primary study article is "included" in a review article if it is considered in the review article's evidence synthesis. Each included primary study article is cited in the review article, but not all references cited in a review article are included in the evidence synthesis or primary study articles. The inclusion network is a bipartite network with two types of nodes: one type represents review articles, and the other represents primary study articles. In an inclusion network, if a review article includes a primary study article, there is a directed edge from the review article node to the primary study article node. The attribute file (article_list.csv) includes attributes of the 392 articles, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Collectively, this dataset reflects the evidence production and use patterns within the exercise science and kinesiology scientific community, investigating the relationship between physical activity and depressive symptoms. FILE FORMATS 1. article_list.csv - Unicode CSV 2. author_list.csv - Unicode CSV 3. Chinese_author_name_reference.csv - Unicode CSV 4. inclusion_net_edges.csv - Unicode CSV 5. review_article_details.csv - Unicode CSV 6. supplementary_reference_list.pdf - PDF 7. README.txt - text file 8. systematic_review_inclusion_criteria.csv - Unicode CSV <b>UPDATES IN THIS VERSION COMPARED TO V3</b> (Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V3) - We added a new file systematic_review_inclusion_criteria.csv.
keywords: systematic reviews; meta-analyses; evidence synthesis; network visualization; tertiary studies; physical activity; depressive symptoms; exercise; review articles
published: 2023-07-14
 
Data for Post-retraction citation: A review of scholarly research on the spread of retracted science Schneider, Jodi; Das, Susmita; Léveillé, Jacqueline; Proescholdt, Randi Contact: Jodi Schneider jodi@illinois.edu & jschneider@pobox.com ********** OVERVIEW ********** This dataset provides further analysis for an ongoing literature review about post-retraction citation. This ongoing work extends a poster presented as: Jodi Schneider, Jacqueline Léveillé, Randi Proescholdt, Susmita Das, and The RISRS Team. Characterization of Publications on Post-Retraction Citation of Retracted Articles. Presented at the Ninth International Congress on Peer Review and Scientific Publication, September 8-10, 2022 hybrid in Chicago. https://hdl.handle.net/2142/114477 (now also in https://peerreviewcongress.org/abstract/characterization-of-publications-on-post-retraction-citation-of-retracted-articles/ ) Items as of the poster version are listed in the bibliography 92-PRC-items.pdf. Note that following the poster, we made several changes to the dataset (see changes-since-PRC-poster.txt). For both the poster dataset and the current dataset, 5 items have 2 categories (see 5-items-have-2-categories.txt). Articles were selected from the Empirical Retraction Lit bibliography (https://infoqualitylab.org/projects/risrs2020/bibliography/ and https://doi.org/10.5281/zenodo.5498474 ). The current dataset includes 92 items; 91 items were selected from the 386 total items in Empirical Retraction Lit bibliography version v.2.15.0 (July 2021); 1 item was added because it is the final form publication of a grouping of 2 items from the bibliography: Yang (2022) Do retraction practices work effectively? Evidence from citations of psychological retracted articles http://doi.org/10.1177/01655515221097623 Items were classified into 7 topics; 2 of the 7 topics have been analyzed to date. ********************** OVERVIEW OF ANALYSIS ********************** DATA ANALYZED: 2 of the 7 topics have been analyzed to date: field-based case studies (n = 20) author-focused case studies of 1 or several authors with many retracted publications (n = 15) FUTURE DATA TO BE ANALYZED, NOT YET COVERED: 5 of the 7 topics have not yet been analyzed as of this release: database-focused analyses (n = 33) paper-focused case studies of 1 to 125 selected papers (n = 15) studies of retracted publications cited in review literature (n = 8) geographic case studies (n = 4) studies selecting retracted publications by method (n = 2) ************** FILE LISTING ************** ------------------ BIBLIOGRAPHY ------------------ 92-PRC-items.pdf ------------------ TEXT FILES ------------------ README.txt 5-items-have-2-categories.txt changes-since-PRC-poster.txt ------------------ CODEBOOKS ------------------ Codebook for authors.docx Codebook for authors.pdf Codebook for field.docx Codebook for field.pdf Codebook for KEY.docx Codebook for KEY.pdf ------------------ SPREADSHEETS ------------------ field.csv field.xlsx multipleauthors.csv multipleauthors.xlsx multipleauthors-not-named.csv multipleauthors-not-named.xlsx singleauthors.csv singleauthors.xlsx *************************** DESCRIPTION OF FILE TYPES *************************** BIBLIOGRAPHY (92-PRC-items.pdf) presents the items, as of the poster version. This has minor differences from the current data set. Consult changes-since-PRC-poster.txt for details on the differences. TEXT FILES provide notes for additional context. These files end in .txt. CODEBOOKS describe the data we collected. The same data is provided in both Word (.docx) and PDF format. There is one general codebook that is referred to in the other codebooks: Codebook for KEY lists fields assigned (e.g., for a journal or conference). Note that this is distinct from the overall analysis in the Empirical Retraction Lit bibliography of fields analyzed; for that analysis see Proescholdt, Randi (2021): RISRS Retraction Review - Field Variation Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2070560_V1 Other codebooks document specific information we entered on each column of a spreadsheet. SPREADSHEETS present the data collected. The same data is provided in both Excel (.xlsx) and CSV format. Each data row describes a publication or item (e.g., thesis, poster, preprint). For column header explainations, see the associated codebook. ***************************** DETAILS ON THE SPREADSHEETS ***************************** field-based case studies CODEBOOK: Codebook for field --REFERS TO: Codebook for KEY DATA SHEET: field REFERS TO: Codebook for KEY --NUMBER OF DATA ROWS: 20 NOTE: Each data row describes a publication/item. --NUMBER OF PUBLICATION GROUPINGS: 17 --GROUPED PUBLICATIONS: Rubbo (2019) - 2 items, Yang (2022) - 3 items author-focused case studies of 1 or several authors with many retracted publications CODEBOOK: Codebook for authors --REFERS TO: Codebook for KEY DATA SHEET 1: singleauthors (n = 9) --NUMBER OF DATA ROWS: 9 --NUMBER OF PUBLICATION GROUPINGS: 9 DATA SHEET 2: multipleauthors (n = 5 --NUMBER OF DATA ROWS: 5 --NUMBER OF PUBLICATION GROUPINGS: 5 DATA SHEET 3: multipleauthors-not-named (n = 1) --NUMBER OF DATA ROWS: 1 --NUMBER OF PUBLICATION GROUPINGS: 1 ********************************* CRediT <http://credit.niso.org> ********************************* Susmita Das: Conceptualization, Data curation, Investigation, Methodology Jaqueline Léveillé: Data curation, Investigation Randi Proescholdt: Conceptualization, Data curation, Investigation, Methodology Jodi Schneider: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Supervision
keywords: retraction; citation of retracted publications; post-retraction citation; data extraction for scoping reviews; data extraction for literature reviews;