Illinois Data Bank
Welcome
Log in
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Displaying 176 - 200 of 840 in total
<
1
2
3
4
5
6
7
8
9
10
11
12
…
33
34
>
25 per page
50 per page
Show All
Go
Clear Filters
Generate Report from Search Results
Subject Area
Life Sciences (475)
Social Sciences (148)
Physical Sciences (135)
Technology and Engineering (78)
Uncategorized
Arts and Humanities (1)
Funder
Other (256)
U.S. National Science Foundation (NSF) (230)
U.S. Department of Energy (DOE) (113)
U.S. National Institutes of Health (NIH) (79)
U.S. Department of Agriculture (USDA) (56)
Illinois Department of Natural Resources (IDNR) (25)
U.S. Geological Survey (USGS) (8)
U.S. National Aeronautics and Space Administration (NASA) (6)
Illinois Department of Transportation (IDOT) (4)
U.S. Army (3)
Publication Year
2025 (144)
2021 (108)
2024 (107)
2022 (106)
2020 (96)
2023 (75)
2019 (72)
2018 (61)
2017 (36)
2016 (30)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)
License
CC0 (447)
CC BY (370)
custom (23)
Illinois Data Bank Dataset Search Results
Dataset Search Results
published: 2021-04-22
Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1
Author-ity 2018 dataset Prepared by Vetle Torvik Apr. 22, 2021 The dataset is based on a snapshot of PubMed taken in December 2018 (NLMs baseline 2018 plus updates throughout 2018). A total of 29.1 million Article records and 114.2 million author name instances. Each instance of an author name is uniquely represented by the PMID and the position on the paper (e.g., 10786286_3 is the third author name on PMID 10786286). Thus, each cluster is represented by a collection of author name instances. The instances were first grouped into "blocks" by last name and first name initial (including some close variants), and then each block was separately subjected to clustering. The resulting clusters are provided in two different formats, the first in a file with only IDs and PMIDs, and the second in a file with cluster summaries: #################### File 1: au2id2018.tsv #################### Each line corresponds to an author name instance (PMID and Author name position) with an Author ID. It has the following tab-delimited fields: 1. Author ID 2. PMID 3. Author name position ######################## File 2: authority2018.tsv ######################### Each line corresponds to a predicted author-individual represented by cluster of author name instances and a summary of all the corresponding papers and author name variants. Each cluster has a unique Author ID (the PMID of the earliest paper in the cluster and the author name position). The summary has the following tab-delimited fields: 1. Author ID (or cluster ID) e.g., 3797874_1 represents a cluster where 3797874_1 is the earliest author name instance. 2. cluster size (number of author name instances on papers) 3. name variants separated by '|' with counts in parenthesis. Each variant of the format lastname_firstname middleinitial, suffix 4. last name variants separated by '|' 5. first name variants separated by '|' 6. middle initial variants separated by '|' ('-' if none) 7. suffix variants separated by '|' ('-' if none) 8. email addresses separated by '|' ('-' if none) 9. ORCIDs separated by '|' ('-' if none). From 2019 ORCID Public Data File https://orcid.org/ and from PubMed XML 10. range of years (e.g., 1997-2009) 11. Top 20 most frequent affiliation words (after stoplisting and tokenizing; some phrases are also made) with counts in parenthesis; separated by '|'; ('-' if none) 12. Top 20 most frequent MeSH (after stoplisting) with counts in parenthesis; separated by '|'; ('-' if none) 13. Journal names with counts in parenthesis (separated by '|'), 14. Top 20 most frequent title words (after stoplisting and tokenizing) with counts in parenthesis; separated by '|'; ('-' if none) 15. Co-author names (lowercased lastname and first/middle initials) with counts in parenthesis; separated by '|'; ('-' if none) 16. Author name instances (PMID_auno separated by '|') 17. Grant IDs (after normalization; '-' if none given; separated by '|'), 18. Total number of times cited. (Citations are based on references harvested from open sources such as PMC). 19. h-index 20. Citation counts (e.g., for h-index): PMIDs by the author that have been cited (with total citation counts in parenthesis); separated by '|'
keywords:
author name disambiguation; PubMed
published: 2024-10-10
Mishra, Apratim; Lee, Haejin; Jeoung, Sullam; Torvik, Vetle; Diesner, Jana (2024): Diversity - PubMed Dataset. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-5259667_V3
Diversity - PubMed dataset Contact: Apratim Mishra (Oct, 2024) This dataset presents article-level (pmid) and author-level (auid) diversity data for PubMed articles. The chosen selection includes articles retrieved from Authority 2018 [1], 907 024 papers, and 1 316 838 authors, and is an expanded dataset of V1. The sample of articles consists of the top 40 journals in the dataset, limited to 2-12 authors published between 1991 – 2014, which are article type "journal type" written in English. Files are 'gzip' compressed and separated by tab space, and V3 includes the correct author count for the included papers (pmids) and updated results with no NaNs. ################################################ File1: auids_plos_3.csv.gz (Important columns defined, 5 in total) • AUID: a unique ID for each author • Genni: gender prediction • Ethnea: ethnicity prediction ################################################# File2: pmids_plos_3.csv.gz (Important columns defined) • pmid: unique paper • auid: all unique auids (author-name unique identification) • year: Year of paper publication • no_authors: Author count • journal: Journal name • years: first year of publication for every author • Country-temporal: Country of affiliation for every author • h_index: Journal h-index • TimeNovelty: Paper Time novelty [2] • nih_funded: Binary variable indicating funding for any author • prior_cit_mean: Mean of all authors’ prior citation rate • Insti_impact: All unique institutions’ citation rate • mesh_vals: Top MeSH values for every author of that paper • relative_citation_ratio: RCR The ‘Readme’ includes a description for all columns. [1] Torvik, Vetle; Smalheiser, Neil (2021): Author-ity 2018 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2273402_V1 [2] Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1
keywords:
Diversity; PubMed; Citation
suppressed by curator
published: 2024-09-28
Huang, Yijing (2024): tellurium_magneto_chiral_instability. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9143327_V1
Per the authors' request, the data files for this dataset are now suppressed. Please visit this new dataset for the complete and updated data files: Huang, Yijing; Fahad , Mahmood (2025): Data for Observation of a Magneto-chiral Instability in Photoexcited Tellurium. University of Illinois Urbana-Champaign.<a href="https://doi.org/10.13012/B2IDB-1409842_V1">https://doi.org/10.13012/B2IDB-1409842_V1</a> ==================== The data and code provided in this dataset can be used to generate key plots in the manuscript. It is divided into four subfolders (B parallel/perpendicular to the tellurium c axis and field/ temperature dependence), each containing the raw data (saved in .mat format), the oscillator parameters obtained through linear prediction (saved in .mat format), and the plot-generating code (.m files). The code was written using MATLAB R2024a. To run the code, go to each folder, and run the .m file in that folder, which generates two plots.
published: 2025-01-31
Punyasena, Surangi W.; Romero, Ingrid; Urban, Michael A. (2025): Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2968712_V1
Title: Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae Authors: Surangi W. Punyasena, Ingrid Romero, Michael A. Urban Subject: Biological sciences Keywords: Malvaceae; superresolution microscopy; Zeiss; Bombacacidites; Neotropics; CZI Funder: NSF-DBI Advances in Bioinformatics (NSF-DBI-1262561) Corresponding Creator: Surangi W. Punyasena This dataset includes a total of 430 images of extant specimens of the Malvaceae, with a focus on species that are or have been included within the subfamily Bombacoideae. There are 27 genera included within 26 folders. Each folder is named by genus and contains all the images that correspond to that genus. Note that the genus _Matisia_ is included with _Quararibea_ as detailed in the metadata READ ME file. The specimens imaged are from the palynological collections of the Swedish Museum of Natural History and Smithsonian Tropical Research Institute, and herbarium specimens from the Smithsonian Herbarium National Museum. The optical superresolution microscopy images were taken using a Zeiss LSM 880 with Airyscan at 630X magnification (63x/NA 1.4 oil DIC). The images are in the original CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or in ImageJ/FIJI. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/en/products/software/zeiss-zen/czi-image-file-format.html] Image metadata and file organization are described in the CSV file "METADATA_Malvaceae_Bombacoideae_modern-species.csv". The column headings are: Folder The folder in which the image file is found Subfamily The current subfamily determination based on the literature. Note that _Pentaplaris_ and _Septotheca_ have not been assigned a subfamily. Genus Genus name Species Species name Accepted name Accepted species name, updated from the literature Slide name Species name as denoted on the herbarium slide Collection Source of the herbarium slide: Sweden National Museum of Natural History or the Smithsonian Tropical Research Institute File name File name using the species name denoted on the herbarium slide Slide ID/Herbarium ID Specimen collection number Please cite this dataset as: Punyasena, Surangi W.; Romero, Ingrid; Urban, Michael A. (2025): Airyscan confocal superresolution images of extant Malvaceae pollen with a focus on Bombacoideae. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2968712_V1
keywords:
Malvaceae; superresolution microscopy; Zeiss; Bombacoideae; Neotropics; CZI
published: 2025-01-30
Peyton, Buddy; Bajjalieh, Joseph; Martin, Michael; Alahi, Sam; Fadell, Norah; Jeralds, Maddie (2025): Cline Center Coup d’État Project Dataset. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9651987_V8
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.2.0 adds 94 additional coup events. 66 of these came from examining Powell and Thyne’s “discarded” events and 28 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Brazil in 1945 and the Congo in 1968. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 as a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011) Version 1.0.0 was released in 2013. This version consolidated coup data taken from the following sources: • The Center for Systemic Peace (Marshall and Marshall, 2007) • The World Handbook of Political and Social Indicators (Taylor and Jodice, 1983) • Coup d’Ètat: A Practical Handbook (Luttwak, 1979) • The Cline Center’s Social, Political and Economic Event Database (SPEED) Project (Nardulli, Althaus and Hayes, 2015) • Government Change in Authoritarian Regimes – 2010 Update (Svolik and Akcinaroglu, 2006) <br> <b>Items in this Dataset</b> 1. <i>Cline Center Coup d'État Codebook v.2.2.0 Codebook.pdf</i> - This 17-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised January 2025</i> 2. <i>Coup Data v2.2.0.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1094 observations. <i>Revised January 2025</i> 3. <i>Source Document v2.2.0.pdf</i> - This 347-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised January 2025</i> 4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised January 2025</i> <br> <b> Citation Guidelines</b> 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2025. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Michael Martin, Sam Alahi, Norah Fadell, and Maddie Jeralds. 2025. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8
published: 2025-01-30
Zhang, Yufan; Bhattarai, Rabin (2025): Data for Framework of Simulating Structural Sediment Perimeter Barriers using VFSMOD. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4475662_V2
This is a research data for a manuscript - A Framework of Simulating Structural Sediment Perimeter Barriers using VFSMOD.
keywords:
sediment control
published: 2020-05-13
Althaus, Scott; Bajjalieh, Joseph; Jungblut, Marc; Shalmon, Dan; Ghosh, Subhankar; Joshi, Pradnyesh (2020): Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset . University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4638196_V1
Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the New York Times published between 1945 and 2018. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. <i>"ResTeCo Project NYT Dataset Variable Descriptions.pdf"</i> <ul> <li>A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset and descriptions of all variables. </li> </ul> 2. <i>"resteco-nyt.csv"</i> <ul><li>This file contains the data extracted from terrorism-related media coverage in the New York Times between 1945 and 2018. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 438,373 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. </li> <li> <b>Please note</b> that care should be taken when using "resteco-nyt.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved.</li> </ul> 3. <i>"README.md"</i> <ul><li>This file contains useful information for the user about the dataset. It is a text file written in mark down language</li> </ul> <b>Citation Guidelines</b> 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1
keywords:
Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis
published: 2024-04-10
Konar, Megan; Ruess, Paul J.; Wanders, Niko; Bierkens, Marc F.P. (2024): Data for Total irrigation by crop in the Continental United States from 2008 to 2020. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-2656127_V1
This dataset provides estimates of total Irrigation Water Use (IWU) by crop, county, water source, and year for the Continental United States. Total irrigation from Surface Water Withdrawals (SWW), total Groundwater Withdrawals (GWW), and nonrenewable Groundwater Depletion (GWD) is provided for 20 crops and crop groups from 2008 to 2020 at the county spatial resolution. In total, there are nearly 2.5 million data points in this dataset (3,142 counties; 13 years; 3 water sources; and 20 crops). This dataset supports the paper by Ruess et al (2024) "Total irrigation by crop in the Continental United States from 2008 to 2020", Scientific Data, doi: 10.1038/s41597-024-03244-w When using, please cite as: Ruess, P.J., Konar, M., Wanders, N., and Bierkens, M.F.P. (2024) Total irrigation by crop in the Continental United States from 2008 to 2020, Scientific Data, doi: 10.1038/s41597-024-03244-w
keywords:
water use; irrigation; surface water; groundwater; groundwater depletion; counties; crops; time series
published: 2024-10-31
Liu, Shanshan; Vlachokostas, Alex; Kontou, Eleftheria (2024): Data for Resilience and environmental benefits of electric school buses as backup power for educational functions continuation during outages. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-4925630_V1
School buses transport 20 million students annually and are currently undergoing electrification in the US. With Vehicle-to-Building (V2B) technology, electric school buses (ESBs) can supply energy to school buildings during power outages, ensuring continued operation and safety. This study proposes assessing the resilience of secondary schools during outages by leveraging ESB fleets as backup power across various US climate regions. The findings indicate that the current fleet of ESBs in representative cities across different climate regions in the US is insufficient to meet the power demands of an entire school or even its HVAC system. However, we estimated the number of ESBs required to support the school's power needs, and we showed that the use of V2B technology significantly reduces carbon emissions compared to backup diesel generators. While adjusting HVAC setpoints and installing solar panels have limited impacts on enhancing school resilience, gathering students in classrooms during outages significantly improved resilience in our case study in Houston, Texas. Given the ongoing electrification of school buses, it is essential for schools to complement ESBs with stationary batteries and other backup power sources, such as solar and/or diesel generators, to effectively address prolonged outages. Determining the deployment of direct current fast and Level 2 chargers can reduce infrastructure costs while maintaining the resilience benefits of ESBs. This dataset includes the simulation process and results of this study.
keywords:
Electric school bus; Power outages,;Vehicle-to-Building technology; Carbon emission reduction; Backup power source
published: 2025-01-23
Smith, Rebecca; Mateus-Pinilla, Nohra (2025): Assessing contact between humans and white-tailed deer in Illinois: a cross-sectional survey. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-1661924_V1
These are the responses to an open, convenience sample survey of residents of Illinois to understand their interactions with wild deer. The survey was available on REDCap between December 19, 2022 and December 19, 2023, and was publicized through listserves, Facebook groups, and media reporting. The file "COVID Deer Survey _ REDCap.pdf" contains the codebook for the survey, including the questions; all factor variables have ".factor" added to their name in the dataset. The file "DeerSurveyData.csv" contains the dataset. The file "Score_calculation_for_sharing.R" is the code to create the cleaned dataset used for analysis from the raw survey responses. Throughout, NA is used to represent null/not available/not applicable; this is most likely either a failure to answer the question or, in some cases, a question that was not presented as it is not relevant based on answers to previous questions.
keywords:
deer; survey
published: 2023-07-14
Punyasena, Surangi W.; Urban, Michael A.; Adaime, Marc-Elie; Romero, Ingrid; Jaramillo, Carlos (2023): Pollen of Podocarpus (Podocarpaceae): Airyscan confocal superresolution images. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-8817604_V1
This dataset includes a total of 300 images of 45 extant species of Podocarpus (Podocarpaceae) and nine images of fossil specimens of the morphogenus Podocarpidites. The goal of this dataset is to capture the diversity of morphology within the genus and create an image database for training machine learning models. The images were taken using Airyscan confocal superresolution microscopy at 630x magnification (63x/NA 1.4 oil DIC). The images are in the CZI file format. They can be opened using Zeiss propriety software (Zen, Zen lite) or open microscopy software, such as ImageJ. More information on how to open CZI files can be found here: [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html] Please cite this dataset and listed publications when using these images.
keywords:
optical superresolution microscopy; Zeiss Airyscan; CZI images; conifer; saccate pollen; Podocarpus; Podocarpidites; Smithsonian Tropical Research Institute
published: 2025-01-15
Suski, Cory; Hay, Allison (2025): Seasonal Variation in Responses of Largemouth Bass Caught During Live-Release Angling Tournaments. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0025771_V1
Data was generated from acoustic transmitters implanted in tournament caught and non-angled control largemouth bass across multiple seasons. This data was used to quantify post-release movement, behavior, and mortality in response to angling tournaments at different times of year and varying water temperatures.
published: 2023-05-02
Lee, Jou; Schneider, Jodi (2023): Crossref data for Assessing the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9099305_V1
Tab-separated value (TSV) file. 14745 data rows. Each data row represents publication metadata as retrieved from Crossref (http://crossref.org) 2023-04-05 when searching for retracted publications. Each row has the following columns: Index - Our index, starting with 0. DOI - Digital Object Identifier (DOI) for the publication Year - Publication year associated with the DOI. URL - Web location associated with the DOI. Title - Title associated with the DOI. May be blank. Author - Author(s) associated with the DOI. Journal - Publication venue (journal, conference, ...) associated with the DOI RetractionYear - Retraction Year associated with the DOI. May be blank. Category - One or more categories associated with the DOI. May be blank. Our search was via the Crossref REST API and searched for: Update_type=( 'retraction', 'Retraction', 'retracion', 'retration', 'partial_retraction', 'withdrawal','removal')
keywords:
retraction; metadata; Crossref; RISRS
published: 2020-04-22
Endres, A. Bryan; Endres, Renata; Krstinić Nižić, Marinela (2020): Croatian Restaurant Allergy Disclosures. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-9891298_V1
Data on Croatian restaurant allergen disclosures on restaurant websites, on-line menus and social media comments
keywords:
restaurant; allergen; disclosure; tourism
published: 2023-07-01
Tonks, Adam; Hwang, Jeongwoo (2023): Data for the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6626437_V1
This is the data used in the paper "Assessment of spatiotemporal flood risk due to compound precipitation extremes across the contiguous United States". Code from the Github repository https://github.com/adtonks/precip_extremes can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.8104252 This dataset is derived from NOAA-CIRES-DOE 20th Century Reanalysis V3. The NOAA-CIRES-DOE Twentieth Century Reanalysis Project version 3 used resources of the National Energy Research Scientific Computing Center managed by Lawrence Berkeley National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and used resources of NOAA's Remotely Deployed High Performance Computing Systems.
keywords:
spatiotemporal; CONUS; United States; precipitation; extremes; flooding
published: 2023-07-05
Njuguna, Joyce; Clark, Lindsay; Lipka, Alexander; Anzoua, Kossonou; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal; Jin, Xiaoli; Johnson, Douglas; Kjeldsen, Jens; Nagano, Hironori; Oliveira, Ivone; Peng, Junhua; Petersen, Karen; Sabitov, Andrey; Seong, Eun; Yamada, Toshihiko; Yoo, Ji; Yu, Chang; Zhao, Hu; Munoz, Patricio; Long, Stephen; Sacks, Erik (2023): Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4829913_V2
This dataset contains all data used in the paper "Impact of genotype-calling methodologies on genome-wide association and genomic prediction in polyploids". The dataset includes genotypes and phenotypic data from two autotetraploid species Miscanthus sacchariflorus and Vaccinium corymbosum that was used used for genome wide association studies and genomic prediction and the scripts used in the analysis. In this V2, 2 files have the raw data are added: "Miscanthus_sacchariflorus_RADSeq.vcf" is the VCF file with the raw SNP calls of the Miscanthus sacchariflorus data used for genotype calling using the 6 genotype calling methods. "Blueberry_data_read_depths.RData" is the a RData file with the read depth data that was used for genotype calling in the Blueberry dataset.
keywords:
Polyploid; allelic dosage; Bayesian genotype-calling; Genome-wide association; Genomic prediction
published: 2023-07-11
Parulian, Nikolaus (2023): Data for A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6827044_V1
The dissertation_demo.zip contains the base code and demonstration purpose for the dissertation: A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning. Each chapter has a demo folder for demonstrating provenance queries or tools. The Airbnb dataset for demonstration and simulation is not included in this demo but is available to access directly from the reference website. Any updates on demonstration and examples can be found online at: https://github.com/nikolausn/dissertation_demo
published: 2023-10-22
Davidson, Ruth; Vachaspati, Pranjal; Mirarab, Siavash; Warnow, Tandy (2023): Data from: Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6670066_V1
HGT+ILS datasets from Davidson, R., Vachaspati, P., Mirarab, S., & Warnow, T. (2015). Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC genomics, 16(10), 1-12. Contains model species trees, true and estimated gene trees, and simulated alignments.
keywords:
evolution; computational biology; bioinformatics; phylogenetics
published: 2024-11-19
Salami, Malik Oyewale; McCumber, Corinne (2024): Dataset for Reassessment of the agreement in retraction indexing across 4 multidisciplinary sources: Crossref, Retraction Watch, Scopus, and Web of Science. University of Illinois Urbana-Champaign. https://doi.org/10.13012/B2IDB-8457537_V1
This project investigates retraction indexing agreement among data sources: Crossref, Retraction Watch, Scopus, and Web of Science. As of July 2024, this reassesses the April 2023 union list of Schneider et al. (2023): https://doi.org/10.55835/6441e5cae04dbe5586d06a5f. As of April 2023, over 1 in 5 DOIs had discrepancies in retraction indexing among the 49,924 DOIs indexed as retracted in at least one of Crossref, Retraction Watch, Scopus, and Web of Science (Schneider et al., 2023). Here, we determine what changed in 15 months. Pipeline code to get the results files can be found in the GitHub repository https://github.com/infoqualitylab/retraction-indexing-agreement in the iPython notebook 'MET-STI2024_Reassessment_of_retraction_indexing_agreement.ipynb' Some files have been redacted to remove proprietary data, as noted in README.txt. Among our sources, data is openly available only for Crossref and Retraction Watch. FILE FORMATS: 1) unionlist_completed_2023-09-03-crws-ressess.csv - UTF-8 CSV file 2) unionlist_completed-ria_2024-07-09-crws-ressess.csv - UTF-8 CSV file 3) unionlist-15months-period_sankey.png - Portable Network Graphics (PNG) file 4) unionlist_ria_proportion_comparison.png - Portable Network Graphics (PNG) file 5) README.txt - text file FILE DESCRIPTION: Description of the files can be found in README.txt
keywords:
retraction status; data quality; indexing; retraction indexing; metadata; meta-science; RISRS
published: 2023-01-05
Tonks, Adam (2023): Data for the paper "Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3628170_V1
This is the data used in the paper "Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data". A preprint may be found at https://doi.org/10.48550/arXiv.2212.11367 Code from the Github repository https://github.com/adtonks/mosquito_GNN can be used with the data here to reproduce the paper's results. v1.0.0 of the code is also archived at https://doi.org/10.5281/zenodo.7897830
keywords:
west nile virus; machine learning; gnn; mosquito; trap; graph neural network; illinois; geospatial
published: 2023-04-12
Towns, John; Hart, David (2023): XSEDE: Allocations Awards and Usage for the NSF Cyberfrastructure Portfolio, 2004-2022. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3731847_V1
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation. Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords:
allocations; cyberinfrastructure; XSEDE
published: 2023-05-30
Clem, C. Scott; Hart, Lily V.; McElrath, Thomas C. (2023): Primary Occurrence Data for "Clem, Hart, & McElrath. 2023. A century of Illinois hover flies (Diptera: Syrphidae): Museum and citizen science data reveal recent range expansions, contractions, and species of potential conservation significance". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1613645_V1
Primary occurrence data for Clem, Hart, & McElrath. 2023. A century of Illinois hover flies (Diptera: Syrphidae): Museum and citizen science data reveal recent range expansions, contractions, and species of potential conservation significance. Included are a license.txt file, the cleaned occurrences from each of the six merged datasets, and a cleaned, merged dataset containing all occurrence records in one spreadsheet, formatted according to Darwin Core standards, with a few extra fields such as GBIF identifiers that were included in some of the original downloads.
keywords:
csv; occurrences; syrphidae; hover flies; flies; biodiversity; darwin core; darwin-core; GBIF; citizen science; iNaturalist
published: 2024-02-08
Martinez, Carlos; Pena, Gisselle; Wells, Kaylee K. (2024): "Prairie Directory of North America" (2013) Entries for the Tallgrass, Mixed Grass, and Shortgrass Prairie Regions of the United States. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0421892_V1
This dataset contains transcribed entries from the "Prairie Directory of North America" (Adelman and Schwartz 2013) for the Tallgrass, Mixed Grass, and Shortgrass prairie regions of the united states. We identified the historical spatial extent of the Tallgrass, Mixed Grass, and Shortgrass prairie regions using Ricketts et al. (1999), Olson et al. (2001), and Dixon et al. (2014) and selected the counties entirely or partially within these boundaries from the USDA Forest Service (2022) file. The resulting lists of counties are included as separate files. The dataset contains information on publicly accessible grasslands and prairies in these regions including acreage and amenities like hunting access, restrooms, parking, and trails.
keywords:
grasslands; prairies; prairie directory of north america; site amenities; site attributes
published: 2018-05-21
Karigerasi, Manohar H.; Wagner, Lucas K.; Shoemaker, Daniel P. (2018): Geometric analysis of magnetic dimensionality. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-3897093_V1
This dataset contains bonding networks and tolerance ranges for geometric magnetic dimensionality. The data can be searched in the html frontend above, code obtained at the GitHub repository, or the raw data can be downloaded as csv below. The csv data contains the results of 42520 compounds (unique icsd_code) from ICSD FindIt v3.5.0. The csv is semicolon-delimited since some fields contain multiple comma-separated values.
keywords:
materials science; physics; magnetism; crystallography
published: 2018-07-25
Scannapieco, Frank; Hoang, Linh; Schneider, Jodi (2018): Expert assessment of RobotReviewer data extraction performance on 10 articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8274875_V1
The PDF describes the process and data used for the heuristic user evaluation described in the related article “<i>Evaluating an automatic data extraction tool based on the theory of diffusion of innovation</i>” by Linh Hoang, Frank Scannapieco, Linh Cao, Yingjun Guan, Yi-Yun Cheng, and Jodi Schneider (under submission).<br /> Frank Scannapieco assessed RobotReviewer data extraction performance on ten articles in 2018-02. Articles are included papers from an update review: Sabharwal A., G.-F.I., Stellrecht E., Scannapeico F.A. <i>Periodontal therapy to prevent the initiation and/or progression of common complex systemic diseases and conditions</i>. An update. Periodontol 2000. In Press. <br/> The form was created in consultation with Linh Hoang and Jodi Schneider. To do the assessment, Frank Scannapieco entered PDFs for these ten articles into RobotReviewer and then filled in ten evaluation forms, based on the ten Robot Reviewer automatic data extraction reports. Linh Hoang analyzed these ten evaluation forms and synthesized Frank Scannapieco’s comments to arrive at the evaluation results for the heuristic user evaluation.
keywords:
RobotReviewer; systematic review automation; data extraction
Research Data Service
Illinois Data Bank
Access and Use Policies
Web Privacy Notice
Contact Us