Displaying 1 - 25 of 136 in total

Subject Area

Social Sciences (136)
Life Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Arts and Humanities (0)
Uncategorized (0)

Funder

Other (32)
U.S. National Science Foundation (NSF) (28)
U.S. National Institutes of Health (NIH) (26)
U.S. Department of Agriculture (USDA) (1)
U.S. Department of Energy (DOE) (0)
Illinois Department of Natural Resources (IDNR) (0)
U.S. National Aeronautics and Space Administration (NASA) (0)
U.S. Geological Survey (USGS) (0)
Illinois Department of Transportation (IDOT) (0)
U.S. Army (0)

Publication Year

2022 (25)
2018 (23)
2020 (23)
2019 (15)
2021 (15)
2023 (15)
2016 (8)
2024 (7)
2017 (5)
2025 (0)
2009 (0)
2011 (0)
2012 (0)
2014 (0)
2015 (0)

License

CC BY (74)
CC0 (62)
custom (0)

Datasets

published: 2024-05-24
 
This dataset consists the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for (Willoughby et al., 2014): Willoughby, Patrick H., Jansma, Matthew J., & Hoye, Thomas R. (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042 We added the DOIs of the citing publications into a Zotero collection, which we exported into a .csv file and an .rtf file. Willoughby2014_286citing_publications.csv is a Zotero data export of the citing publications. Willoughby2014_286citing_publications.rtf is a bibliography of the citing publications, using a variation of American Psychological Association style (7th edition) with full names instead of initials. We developed an automation system to analyze unreliability propagation through the publications citing an unreliable publication: Willoughby et al., 2014 (one of the Python scripts that supported the protocol presented in this publication has a code glitch). We call a publication "unreliable by propagation" when its main findings have become unreliable by citing an unreliable source. The system triaged the citing publications that are in English (284) according to whether they are at risk because of citing Willoughby et al., 2014. We excluded 2 publications that are not in English, their DOIs are 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201. We compared the accuracy of the system's triage with a separate manual analysis the chemistry expert (YF) conducted on the 284 citing publications. 284_merged_decision_and_annotation.csv (new in this V2) shows the system triage results and the results of a chemistry domain expert (YF)'s manual analysis on the 284 citing publications.
keywords: scientific publications; arguments; citation contexts; defeasible reasoning; Zotero; Web of Science; Scopus; unreliable cited sources; automation systems; knowledge maintenance
published: 2024-03-21
 
Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. This is a repository to save datasets and codes related to this project. Please read and cite the following paper if you would like to use the data: Becker M., Han K., Werthmann A., Rezapour R., Lee H., Diesner J., and Witt A. (2024). Detecting Impact Relevant Sections in Scientific Research. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING). This folder contains the following files: evaluation_20220927.ods: Annotated German passages (Artificial Intelligence, Linguistics, and Music) - training data annotated_data.big_set.corrected.txt: Annotated German passages (Mobility) - training data incl_translation_all.csv: Annotated English passages (Artificial Intelligence, Linguistics, and Music) - training data incl_translation_mobility.csv: Annotated German passages (Mobility) - training data ttparagraph_addmob.txt: German corpus (unannotated passages) model_result_extraction.csv: Extracted impact-relevant passages from the German corpus based on the model we trained rf_model.joblib: The random forest model we trained to extract impact-relevant passages Data processing codes can be found at: https://github.com/khan1792/texttransfer
keywords: impact detection; project reports; annotation; mixed-methods; machine learning
published: 2019-06-13
 
This lexicon is the expanded/enhanced version of the Moral Foundation Dictionary created by Graham and colleagues (Graham et al., 2013). Our Enhanced Morality Lexicon (EML) contains a list of 4,636 morality related words. This lexicon was used in the following paper - please cite this paper if you use this resource in your work. Rezapour, R., Shah, S., & Diesner, J. (2019). Enhancing the measurement of social effects by capturing morality. Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN. In addition, please consider citing the original MFD paper: <a href="https://doi.org/10.1016/B978-0-12-407236-7.00002-4">Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (Vol. 47, pp. 55-130)</a>.
keywords: lexicon; morality
published: 2024-05-07
 
This dataset builds on an existing dataset which captures artists’ demographics who are represented by top tier galleries in the 2016–2017 New York art season (Case-Leal, 2017, https://web.archive.org/web/20170617002654/http://www.havenforthedispossessed.org/) with a census of reviews and catalogs about those exhibitions to assess proportionality of media coverage across race and gender. The readme file explains variables, collection, relationship between the datasets, and an example of how the Case-Leal dataset was transformed. The ArticleDataset.csv provides all articles with citation information as well as artist, artistic identity characteristic, and gallery. The ExhibitionCatalog.csv provides exhibition catalog citation information for each identified artist.
keywords: diversity and inclusion; diversity audit; contemporary art; art exhibitions; art exhibition reviews; exhibition catalogs; magazines; newspapers; demographics
published: 2019-02-19
 
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
keywords: databases; research infrastructure; sustainability; data sharing; molecular biology; bioinformatics; bibliometrics
published: 2019-05-31
 
The data are provided to illustrate methods in evaluating systematic transactional data reuse in machine learning. A library account-based recommender system was developed using machine learning processing over transactional data of 383,828 transactions (or check-outs) sourced from a large multi-unit research library. The machine learning process utilized the FP-growth algorithm over the subject metadata associated with physical items that were checked-out together in the library. The purpose of this research is to evaluate the results of systematic transactional data reuse in machine learning. The analysis herein contains a large-scale network visualization of 180,441 subject association rules and corresponding node metrics.
keywords: evaluating machine learning; network science; FP-growth; WEKA; Gephi; personalization; recommender systems
published: 2018-09-06
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 to present, with awards continuing through the end of the second XSEDE award in 2021. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2024-02-27
 
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf</i> - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised February 2024</i> 2. <i>Coup Data v2.1.3.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. <i>Revised February 2024</i> 3. <i>Source Document v2.1.3.pdf</i> - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised February 2024</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised February 2024</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
published: 2024-03-25
 
Diversity - PubMed dataset Contact: Apratim Mishra (March 22, 2024) This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The selection chosen includes articles retrieved from Authority 2018 [1], a total of 228 040 papers and 440 310 authors. The sample of papers is based on the top 40 journals in the dataset, limited to 2-10 authors published between 1990 – 2010, and stratified on paper count per year. Additionally, this dataset is limited to papers where the lead author is affiliated with one of the four countries: the US, the UK, Canada, and Australia. Files are encoded with ‘utf-8’. ################################################ File1: auids_plos.csv (Important columns defined, 7 in total) • AUID: a unique ID for each author • Ethnea: ethnicity prediction • Genni: gender prediction ################################################# File2: pmids_plos.csv (Important columns defined, 33 in total) • pmid: unique paper ID • year: Year of paper publication • no_authors: Author count • journal: Journal name • years: first year of publication for every author • age_bin: Binned age for every author • Country-temporal: Country of affiliation for every author • h_index: Journal h-index • TimeNovelty: Paper Time novelty [2] • nih_funded: Binary variable indicating NIH funding for any author • prior_cit_mean: Mean of all authors’ prior citation rate • Insti_impact_all: All authors’ respective institutions’ citation count • Insti_impact: Maximum of all institutions’ citation count • mesh_vals: Top MeSH values for every author for that paper • outer_mesh_vals: MeSH qualifiers for every author for that paper • relative_citation_ratio: RCR The ‘Readme’ includes a description for all columns. [1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1 [2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords: Diversity; PubMed; Citation
published: 2016-06-06
 
These datasets represent first-time collaborations between first and last authors (with mutually exclusive publication histories) on papers with 2 to 5 authors in years [1988,2009] in PubMed. Each record of each dataset captures aspects of the similarity, nearness, and complementarity between two authors about the paper marking the formation of their collaboration.
published: 2023-02-23
 
Coups d'État are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d'État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e. realized or successful coups, unrealized coup attempts, or thwarted conspiracies) the type of actor(s) who initiated the coup (i.e. military, rebels, etc.), as well as the fate of the deposed leader. This current version, Version 2.1.2, adds 6 additional coup events that occurred in 2022 and updates the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrects a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixes this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removes two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and adds executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Changes from the previously released data (v2.0.0) also include: 1. Adding additional events and expanding the period covered to 1945-2022 2. Filling in missing actor information 3. Filling in missing information on the outcomes for the incumbent executive 4. Dropping events that were incorrectly coded as coup events <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.1.2 Codebook.pdf</i> - This 16-page document provides a description of the Cline Center Coup d’État Project Dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d’État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised February 2023</i> 2. <i>Coup Data v2.1.2.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 981 observations. <i>Revised February 2023</i> 3. <i>Source Document v2.1.2.pdf</i> - This 315-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised February 2023</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised February 2023</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2023. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2023. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.2. February 23. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V6
published: 2020-02-12
 
This dataset contains the results of a three month audit of housing advertisements. It accompanies the 2020 ICWSM paper "Auditing Race and Gender Discrimination in Online Housing Markets". It covers data collected between Dec 7, 2018 and March 19, 2019. There are two json files in the dataset: The first contains a list of json objects representing advertisements separated by newlines. Each object includes the date and time it was collected, the image and title (if collected) of the ad, the page on which it was displayed, and the training treatment it received. The second file is a list of json objects representing a visit to a housing lister separated by newlines. Each object contains the url, training treatment applied, the location searched, and the metadata of the top sites scraped. This metadata includes location, price, and number of rooms. The dataset also includes the raw images of ads collected in order to code them by interest and targeting. These were captured by selenium and named using a perceptive hash to de-duplicate images.
keywords: algorithmic audit; advertisement audit;
published: 2018-12-20
 
File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published: 2020-07-16
 
Dataset to be for SocialMediaIE tutorial
keywords: social media; deep learning; natural language processing
published: 2021-11-05
 
This data set contains survey results from a 2021 survey of University of Illinois University Library employees conducted as part of the Becoming A Trans Inclusive Library Project to evaluate the awareness of University of Illinois faculty, staff, and student employees regarding transgender identities, and to assess the professional development needs of library employees to better serve trans and gender non-conforming patrons. The survey instrument is available in the IDEALS repository: http://hdl.handle.net/2142/110080.
keywords: transgender awareness, academic library, gender identity awareness, professional development opportunities
published: 2016-12-19
 
Files in this dataset represent an investigation into use of the Library mobile app Minrva during the months of May 2015 through December 2015. During this time interval 45,975 API hits were recorded by the Minrva web server. The dataset included herein is an analysis of the following: 1) a delineation of API hits to mobile app modules use in the Minrva app by month, 2) a general analysis of Minrva app downloads to module use, and 3) the annotated data file providing associations from API hits to specific modules used, organized by month (May 2015 – December 2015).
keywords: API analysis; log analysis; Minrva Mobile App
published: 2023-03-28
 
Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. <b>Files</b>: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE.  • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs.  • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable.  • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component.  • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-".  • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence.  • progression: Text progression of the citation context/sentence.  <b>Supplementary Files</b> • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • intxt_cit_license_fromPMC.tsv – This file contains the CC licensing information for each article. The licensing information is from PMC's file lists [2], retrieved on June 19, 2020, and March 9, 2023. It should be noted that the license information for 189,855 PMCIDs is <b>NO-CC CODE</b> in the file lists, and 521 PMCIDs are absent in the file lists. The absence of CC licensing information does not indicate that the article lacks a CC license. For example, PMCID: 6156294 (<b>NO-CC CODE</b>) and PMCID: 6118074 (absent in the PMC's file lists) are under CC-BY licenses according to their PDF versions of articles. The intxt_cit_license_fromPMC.tsv file has two columns: • pmcid: PMCID of the article. • license: The article’s CC license information provided in PMC’s file lists. The value is nan when an article is not present in the PMC’s file lists. [2] https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ • Supplementary_File_1.zip – This file contains the code for generating the dataset.
keywords: citation context; in-text citation; inline citation; bibliometrics; science of science
published: 2023-04-12
 
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation. Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords: allocations; cyberinfrastructure; XSEDE
published: 2023-08-02
 
This dataset was developed as part of an online survey study that investigates how phatic expressions—comments that are social rather than informative in nature—influence the perceived helpfulness of online peer help-giving replies in an asynchronous college course discussion forum. During the study, undergraduate students (N = 320) rated and described the helpfulness of examples of replies to online requests for help, both with and without four types of phatic expressions: greeting/parting tokens, other-oriented comments, self-oriented comments, and neutral comments.
keywords: help-giving; phatic expression; discussion forum; online learning; engagement
published: 2023-09-21
 
The relationship between physical activity and mental health, especially depression, is one of the most studied topics in the field of exercise science and kinesiology. Although there is strong consensus that regular physical activity improves mental health and reduces depressive symptoms, some debate the mechanisms involved in this relationship as well as the limitations and definitions used in such studies. Meta-analyses and systematic reviews continue to examine the strength of the association between physical activity and depressive symptoms for the purpose of improving exercise prescription as treatment or combined treatment for depression. This dataset covers 27 review articles (either systematic review, meta-analysis, or both) and 365 primary study articles addressing the relationship between physical activity and depressive symptoms. Primary study articles are manually extracted from the review articles. We used a custom-made workflow (Fu, Yuanxi. (2022). Scopus author info tool (1.0.1) [Python]. <a href="https://github.com/infoqualitylab/Scopus_author_info_collection">https://github.com/infoqualitylab/Scopus_author_info_collection</a> that uses the Scopus API and manual work to extract and disambiguate authorship information for the 392 reports. The author information file (author_list.csv) is the product of this workflow and can be used to compute the co-author network of the 392 articles. This dataset can be used to construct the inclusion network and the co-author network of the 27 review articles and 365 primary study articles. A primary study article is "included" in a review article if it is considered in the review article's evidence synthesis. Each included primary study article is cited in the review article, but not all references cited in a review article are included in the evidence synthesis or primary study articles. The inclusion network is a bipartite network with two types of nodes: one type represents review articles, and the other represents primary study articles. In an inclusion network, if a review article includes a primary study article, there is a directed edge from the review article node to the primary study article node. The attribute file (article_list.csv) includes attributes of the 392 articles, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Collectively, this dataset reflects the evidence production and use patterns within the exercise science and kinesiology scientific community, investigating the relationship between physical activity and depressive symptoms. FILE FORMATS 1. article_list.csv - Unicode CSV 2. author_list.csv - Unicode CSV 3. Chinese_author_name_reference.csv - Unicode CSV 4. inclusion_net_edges.csv - Unicode CSV 5. review_article_details.csv - Unicode CSV 6. supplementary_reference_list.pdf - PDF 7. README.txt - text file 8. systematic_review_inclusion_criteria.csv - Unicode CSV <b>UPDATES IN THIS VERSION COMPARED TO V3</b> (Clarke, Caitlin; Lischwe Mueller, Natalie; Joshi, Manasi Ballal; Fu, Yuanxi; Schneider, Jodi (2023): The Inclusion Network of 27 Review Articles Published between 2013-2018 Investigating the Relationship Between Physical Activity and Depressive Symptoms. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4614455_V3) - We added a new file systematic_review_inclusion_criteria.csv.
keywords: systematic reviews; meta-analyses; evidence synthesis; network visualization; tertiary studies; physical activity; depressive symptoms; exercise; review articles
published: 2023-07-14
 
Data for Post-retraction citation: A review of scholarly research on the spread of retracted science Schneider, Jodi; Das, Susmita; Léveillé, Jacqueline; Proescholdt, Randi Contact: Jodi Schneider jodi@illinois.edu & jschneider@pobox.com ********** OVERVIEW ********** This dataset provides further analysis for an ongoing literature review about post-retraction citation. This ongoing work extends a poster presented as: Jodi Schneider, Jacqueline Léveillé, Randi Proescholdt, Susmita Das, and The RISRS Team. Characterization of Publications on Post-Retraction Citation of Retracted Articles. Presented at the Ninth International Congress on Peer Review and Scientific Publication, September 8-10, 2022 hybrid in Chicago. https://hdl.handle.net/2142/114477 (now also in https://peerreviewcongress.org/abstract/characterization-of-publications-on-post-retraction-citation-of-retracted-articles/ ) Items as of the poster version are listed in the bibliography 92-PRC-items.pdf. Note that following the poster, we made several changes to the dataset (see changes-since-PRC-poster.txt). For both the poster dataset and the current dataset, 5 items have 2 categories (see 5-items-have-2-categories.txt). Articles were selected from the Empirical Retraction Lit bibliography (https://infoqualitylab.org/projects/risrs2020/bibliography/ and https://doi.org/10.5281/zenodo.5498474 ). The current dataset includes 92 items; 91 items were selected from the 386 total items in Empirical Retraction Lit bibliography version v.2.15.0 (July 2021); 1 item was added because it is the final form publication of a grouping of 2 items from the bibliography: Yang (2022) Do retraction practices work effectively? Evidence from citations of psychological retracted articles http://doi.org/10.1177/01655515221097623 Items were classified into 7 topics; 2 of the 7 topics have been analyzed to date. ********************** OVERVIEW OF ANALYSIS ********************** DATA ANALYZED: 2 of the 7 topics have been analyzed to date: field-based case studies (n = 20) author-focused case studies of 1 or several authors with many retracted publications (n = 15) FUTURE DATA TO BE ANALYZED, NOT YET COVERED: 5 of the 7 topics have not yet been analyzed as of this release: database-focused analyses (n = 33) paper-focused case studies of 1 to 125 selected papers (n = 15) studies of retracted publications cited in review literature (n = 8) geographic case studies (n = 4) studies selecting retracted publications by method (n = 2) ************** FILE LISTING ************** ------------------ BIBLIOGRAPHY ------------------ 92-PRC-items.pdf ------------------ TEXT FILES ------------------ README.txt 5-items-have-2-categories.txt changes-since-PRC-poster.txt ------------------ CODEBOOKS ------------------ Codebook for authors.docx Codebook for authors.pdf Codebook for field.docx Codebook for field.pdf Codebook for KEY.docx Codebook for KEY.pdf ------------------ SPREADSHEETS ------------------ field.csv field.xlsx multipleauthors.csv multipleauthors.xlsx multipleauthors-not-named.csv multipleauthors-not-named.xlsx singleauthors.csv singleauthors.xlsx *************************** DESCRIPTION OF FILE TYPES *************************** BIBLIOGRAPHY (92-PRC-items.pdf) presents the items, as of the poster version. This has minor differences from the current data set. Consult changes-since-PRC-poster.txt for details on the differences. TEXT FILES provide notes for additional context. These files end in .txt. CODEBOOKS describe the data we collected. The same data is provided in both Word (.docx) and PDF format. There is one general codebook that is referred to in the other codebooks: Codebook for KEY lists fields assigned (e.g., for a journal or conference). Note that this is distinct from the overall analysis in the Empirical Retraction Lit bibliography of fields analyzed; for that analysis see Proescholdt, Randi (2021): RISRS Retraction Review - Field Variation Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2070560_V1 Other codebooks document specific information we entered on each column of a spreadsheet. SPREADSHEETS present the data collected. The same data is provided in both Excel (.xlsx) and CSV format. Each data row describes a publication or item (e.g., thesis, poster, preprint). For column header explainations, see the associated codebook. ***************************** DETAILS ON THE SPREADSHEETS ***************************** field-based case studies CODEBOOK: Codebook for field --REFERS TO: Codebook for KEY DATA SHEET: field REFERS TO: Codebook for KEY --NUMBER OF DATA ROWS: 20 NOTE: Each data row describes a publication/item. --NUMBER OF PUBLICATION GROUPINGS: 17 --GROUPED PUBLICATIONS: Rubbo (2019) - 2 items, Yang (2022) - 3 items author-focused case studies of 1 or several authors with many retracted publications CODEBOOK: Codebook for authors --REFERS TO: Codebook for KEY DATA SHEET 1: singleauthors (n = 9) --NUMBER OF DATA ROWS: 9 --NUMBER OF PUBLICATION GROUPINGS: 9 DATA SHEET 2: multipleauthors (n = 5 --NUMBER OF DATA ROWS: 5 --NUMBER OF PUBLICATION GROUPINGS: 5 DATA SHEET 3: multipleauthors-not-named (n = 1) --NUMBER OF DATA ROWS: 1 --NUMBER OF PUBLICATION GROUPINGS: 1 ********************************* CRediT <http://credit.niso.org> ********************************* Susmita Das: Conceptualization, Data curation, Investigation, Methodology Jaqueline Léveillé: Data curation, Investigation Randi Proescholdt: Conceptualization, Data curation, Investigation, Methodology Jodi Schneider: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Supervision
keywords: retraction; citation of retracted publications; post-retraction citation; data extraction for scoping reviews; data extraction for literature reviews;
published: 2021-05-01
 
This is the first version of the dataset. This dataset contains anonymize data collected during the experiments mentioned in the publication: “I can show what I really like.”: Eliciting Preferences via Quadratic Voting that would appear in April 2021. Once the publication link is public, we would provide an update here. These data were collected through our open-source online systems that are available at (experiment1)[https://github.com/a2975667/QV-app] and (experiment 2)[https://github.com/a2975667/QV-buyback] There are two folders in this dataset. The first folder (exp1_data) contains data collected during experiment 1; the second folder (exp2_data) contains data collected during experiment 2.
keywords: Quadratic Voting; Likert scale; Empirical studies; Collective decision-making
published: 2023-07-11
 
The dissertation_demo.zip contains the base code and demonstration purpose for the dissertation: A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning. Each chapter has a demo folder for demonstrating provenance queries or tools. The Airbnb dataset for demonstration and simulation is not included in this demo but is available to access directly from the reference website. Any updates on demonstration and examples can be found online at: https://github.com/nikolausn/dissertation_demo
published: 2019-09-17
 
Trained models for multi-task multi-dataset learning for text classification in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py</a> See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning; sentiment; sarcasm; abusive content;
published: 2019-09-17
 
Trained models for multi-task multi-dataset learning for sequence tagging in tweets. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: <a href="https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_experiment.py">https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_experiment.py</a> See <a href="https://github.com/socialmediaie/SocialMediaIE">https://github.com/socialmediaie/SocialMediaIE</a> and <a href="https://socialmediaie.github.io">https://socialmediaie.github.io</a> for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning;