Displaying Dataset 1 - 25 of 253 in total

Subject Area

Life Sciences (126)
Social Sciences (70)
Physical Sciences (31)
Technology and Engineering (26)


U.S. National Science Foundation (NSF) (70)
Other (66)
U.S. National Institutes of Health (NIH) (30)
U.S. Department of Energy (DOE) (19)
U.S. Department of Agriculture (USDA) (10)
Illinois Department of Natural Resources (IDNR) (6)
U.S. National Aeronautics and Space Administration (NASA) (3)
U.S. Geological Survey (USGS) (3)
U.S. Army (1)

Publication Year

2019 (80)
2018 (59)
2020 (48)
2017 (35)
2016 (30)
2021 (1)


CC0 (141)
CC BY (109)
custom (3)
published: 2020-06-30
This file contains 13 unique case studies that were created for the One health: Infectious diseases course offered at the University of Illinois at Urbana-Champaign campus. The case studies are being made available as educational resources for other One health courses. Each case study is focused on a theme/topic which is associated with One health. These case studies were created using publicly available information and references have been provided for each case study.
keywords: One health education; infectious diseases; case studies
published: 2020-06-26
This dataset contains the PartMC-MOSAIC simulations used in the article "Quantifying Errors in the Aerosol Mixing-State Index Based on Limited Particle Sample Size". The 1000 simulations of output data is organized into a series of archived folders, each containing 100 scenarios. Within each scenario directory are 25 NetCDF files, which are the hourly output of a PartMC-MOSAIC simulation containing all information regarding the environment, particle and gas state. This dataset was used to investigate the impact of sample size on determining aerosol mixing state. This data may be useful as a data set for applying different types of estimators.
keywords: Atmospheric aerosols; single-particle measurements; sampling uncertainty; NetCDF
published: 2020-02-12
This dataset contains the results of a three month audit of housing advertisements. It accompanies the 2020 ICWSM paper "Auditing Race and Gender Discrimination in Online Housing Markets". It covers data collected between Dec 7, 2018 and March 19, 2019. There are two json files in the dataset: The first contains a list of json objects representing advertisements separated by newlines. Each object includes the date and time it was collected, the image and title (if collected) of the ad, the page on which it was displayed, and the training treatment it received. The second file is a list of json objects representing a visit to a housing lister separated by newlines. Each object contains the url, training treatment applied, the location searched, and the metadata of the top sites scraped. This metadata includes location, price, and number of rooms. The dataset also includes the raw images of ads collected in order to code them by interest and targeting. These were captured by selenium and named using a perceptive hash to de-duplicate images.
keywords: algorithmic audit; advertisement audit;
published: 2020-02-12
This is the dataset used in the Landscape Ecology publication of the same name. This dataset consists of the following files: NWCA_Int_Veg.txt NWCA_Reg_Veg.txt NWCA_Site_Attributes.txt NWCA_Int_Veg.txt is a site and plot by species matrix. Column labeled SITES consists of site IDs. Column labeled Plots consist of Plot ID numbers. All other columns represent species abundances (estimates of percent cover, summed across five plots). NWCA_Reg_Veg.txt is a site by species matrix of species abundances. Column labeled SITES consist of site IDs. All other columns represent species abundances (estimates of percent cover within individual plots). NWCA_Site_Attributes.txt is a matrix of site attributes. Column labeled SITES consist of site IDs. Column labeled AA_CENTER_LAT consist of latitudinal coordinates for the Assessment Area center point in decimal degrees. Column labeled AA_CENTER_LONG consist of longitudinal coordinates for the Assessment Area center point in decimal degrees. Column REFPLUS_NWCA represents disturbance gradient classes including MIN (minimally disturbed), L (least disturbed), I (intermediate), M (most disturbed). Column REFPLUS_NWCA2 represents revised disturbance gradient classes based on protocols described in the article. These revised classes were used for analysis. Column labeled STRESS_HEAVYMETAL represents heavy metal stressor classes, used to ascertain which wetlands were missing soil data. Classes in the STRESS_HEAVYMETAL column include Low, Moderate, High, and Missing. Sites with Missing STRESS_HEAVYMETAL classes were removed from analysis. More information about this dataset: All of the data used in this analysis was gathered from the National Wetlands Condition Assessment. Wetland surveys were conducted from 4/4/2011 to 11/2/2011. The entire National Wetlands Condition Assessment Dataset, which includes 3640 unique taxonomic identities of plants, can be found at: https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys
keywords: Anthropogenic disturbance; β-Diversity; Biotic homogenization; Phalaris arundinacea; reed canary grass; Wetlands
planned publication date: 2020-08-01
The Empoascini_morph_data.nex text file contains the original data used in the phylogenetic analyses of Xu et al. (Systematic Entomology, in review). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first nine lines of the file indicate the file type (Nexus), that 110 taxa were analyzed, that a total of 99 characters were analyzed, the format of the data, and specification for symbols used in the dataset to indicate different character states. For species that have more than one state for a particular character, the states are enclosed in square brackets. Question marks represent missing data.The pdf file, Appendix1.pdf, is available here and describes the morphological characters and character states that were scored in the dataset. The data analyses are described in the cited original paper.
keywords: Hemiptera; Cicadellidae; morphology; biogeography; evolution
published: 2020-06-06
These data are from an observational study and small experiment investigating reproductive biology and hybridization between two plants, Celastrus scandens L. and Celastrus orbiculatus Thunb. (Celastraceae). These data were collected during the 2008 growing season from the Indiana Dunes National Park (formerly Indiana Dunes National Lakeshore), just east of the municipality of Ogden Dunes, Indiana, USA. The five data files provide information on floral output of the two species, fertilization rate, fruit set rate, hybridization rate at two scales (individual flowers in both species, individual maternal plants in C. scandens), and the results of a hand-pollination experiment that exchanged pollen between the two species. There are six data files associated with this submission, five data files in comma-separated values format and one text file (‘readme.txt’) that includes detailed explanations of the data files.
keywords: Celastrus; invasive species; hybridization; heterospecific pollen; hand pollination
published: 2020-06-19
This dataset include data pulled from the World Bank 2009, the World Values Survey wave 6, Transparency International from 2009. The data were used to measure perceptions of expertise from individuals in nations that are recipients of development aid as measured by the World Bank.
keywords: World Values Survey; World Bank; expertise; development
published: 2020-06-12
This is a network of 14 systematic reviews on the salt controversy and their included studies. Each edge in the network represents an inclusion from one systematic review to an article. Systematic reviews were collected from Trinquart (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ). <b>FILE FORMATS</b> 1) Article_list.csv - Unicode CSV 2) Article_attr.csv - Unicode CSV 3) inclusion_net_edges.csv - Unicode CSV 4) potential_inclusion_link.csv - Unicode CSV 5) systematic_review_inclusion_criteria.csv - Unicode CSV 6) Supplementary Reference List.pdf - PDF <b>ROW EXPLANATIONS</b> 1) Article_list.csv - Each row describes a systematic review or included article. 2) Article_attr.csv - Each row is the attributes of a systematic review/included article. 3) inclusion_net_edges.csv - Each row represents an inclusion from a systematic review to an article. 4) potential_inclusion_link.csv - Each row shows the available evidence base of a systematic review. 5) systematic_review_inclusion_criteria.csv - Each row is the inclusion criteria of a systematic review. 6) Supplementary Reference List.pdf - Each item is a bibliographic record of a systematic review/included paper. <b>COLUMN HEADER EXPLANATIONS</b> <b>1) Article_list.csv:</b> ID - Numeric ID of a paper paper assigned ID - ID of the paper from Trinquart et al. (2016) Type - Systematic review / primary study report Study Groupings - Groupings for related primary study reports from the same report, from Trinquart et al. (2016) (if applicable, otherwise blank) Title - Title of the paper year - Publication year of the paper Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016) Doi - DOIs of the paper. (if applicable, otherwise blank) Retracted (Y/N) - Whether the paper was retracted or withdrawn (Y). Blank if not retracted or withdrawn. <b>2) Article_attr.csv:</b> ID - Numeric ID of a paper year - Publication year Attitude - Scientific opinion about the salt controversy from Trinquart et al. (2016) Type - Systematic review/ primary study report <b>3) inclusion_net_edges.csv:</b> citing_ID - The numeric ID of a systematic review cited_ID - The numeric ID of the included articles <b>4) potential_inclusion_link.csv:</b> This data was translated from the Sankey diagram given in Trinquart et al. (2016) as Web Figure 4. Each row indicates a systematic review and each column indicates a primary study. In the matrix, "p" indicates that a given primary study had been published as of the search date of a given systematic review. <b>5)systematic_review_inclusion_criteria.csv:</b> ID - The numeric IDs of systematic reviews paper assigned ID - ID of the paper from Trinquart et al. (2016) attitude - Its scientific opinion about the salt controversy from Trinquart et al. (2016) No. of studies included - Number of articles included in the systematic review Study design - Study designs to include, per inclusion criteria population - Populations to include, per inclusion criteria Exposure/Intervention - Exposures/Interventions to include, per inclusion criteria outcome - Study outcomes required for inclusion, per inclusion criteria Language restriction - Report languages to include, per inclusion criteria follow-up period - Follow-up period required for inclusion, per inclusion criteria
keywords: systematic reviews; evidence synthesis; network visualization; tertiary studies
published: 2020-06-11
This dataset contains data from 10 simulations using a landscape evolution model. A landscape evolution model simulates how uplift and rock incision shape the Earth's (or other planets) surface. To date, most landscape evolution models exhibit "extreme memory" (paper: https://doi.org/10.1029/2019GL083305 and dataset: https://doi.org/10.13012/B2IDB-4484338_V1). Extreme memory in landscape evolution models causes initial conditions to be unrealistically preserved. This dataset contains simulations from a new landscape evolution model that incorporates a sub-model that allows bedrock channels to erode laterally. With this addition, the landscapes no longer exhibit extreme memory. Initial conditions are erased over time, and the landscapes tend towards a dynamic steady state instead of a static one. There are 2 animations for each simulation, which totals for 20 animations (10 simulations x 2). For each simulation, there is 1 animation that shows elevation and 1 animation that shows drainage area over time. Elevation depicts the height of the landscape, and drainage area represents a contributing area that is upslope. There are 20 folders in the dataset, and each folder contains: >1000 .asc raster files >1 .wmv formatted movie >1 .mp4 formatted movie The associated publication for this dataset has not yet been published, and we will update this description with a link when it is.
keywords: landscape evolution; drainage networks; lateral migration; geomorphology
published: 2018-11-18
This dataset contains experimental measurements used in the paper, "Ultra-sensitivity of Numerical Landscape Evolution Models to their Initial Conditions." (to be submitted). The data is taken from experimental runs in a miniature landscape model named the eXperimental Landscape Evolution (XLE) facility. In this facility, we complete five >24hr runs at 5 minute temporal resolution. Every five minutes, an planform image was capture, and a digital elevation model (DEM) was generated. For each run, images and a corresponding animation of images are documented. In addition,ASCII formatted DEMs along with color hillshade maps were generated. The hillshade map images were also made into an animation. This dataset is associated with the following publication: https://doi.org/10.1029/2019GL083305
keywords: landscape evolution model; digital elevation model; geomorphology
planned publication date: 2021-06-08
Dataset associated with Jones and Ward JAE-2020-0031.R1 submission: Pre-to post-fledging carryover effects and the adaptive significance of variation in wing development for juvenile songbirds. Excel CSV files with data used in analyses and file with descriptions of each column. The flight ability variable in this dataset was derived from fledgling drop tests, examples of which can be found in the related dataset: Jones, Todd M.; Benson, Thomas J.; Ward, Michael P. (2019): Flight Ability of Juvenile Songbirds at Fledgling: Examples of Fledgling Drop Tests. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2044905_V1.
keywords: fledgling; wing development; life history; adaptive significance; post-fledging; songbirds
published: 2020-06-02
The text file contains the original data used in the phylogenetic analyses of Xue et al. (2020: Systematic Entomology, in press). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 89 taxa (species) and 2676 characters, indicate that the first 2590 characters are DNA sequence and the last 86 are morphological, that gaps inserted into the DNA sequence alignment and inapplicable morphological characters are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 86 morphological characters. The positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file (Subset1 = 16S gene; Subset2 = 28S gene; Subset3 = COI gene; Subset 4 = Histone H3 and H2A genes). The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf, also available from the journal website. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; 16S rDNA; histone H3; histone H2A; cytochrome oxidase I; Bayesian analysis
published: 2020-06-03
This datasets provide basis of our analysis in the paper - Potential Impacts of Supersonic Aircraft on Stratospheric Ozone and Climate. All datasets here can be categorized into emission data and model output data (WACCM). All the model simulations (background and perturbation) were run to steady-state and only the datasets used in analysis are archived here.
keywords: NetCDF; Supersonic aircraft; Stratospheric ozone; Climate
published: 2020-06-03
This dataset provides files for use in analysis of human land preference across Australasia, and in a localized analysis of land preference in Laos and Vietnam. All files can be imported into ArcGIS for visualization, and re-analyzed using the open source Maxent species distribution modeling program. CSV files contain known human presence sites for model validation. ASC files contain geographically coded environmental data for mean annual temperature and mean annual precipitation during the Last Glacial Maximum, as well as downward slope data. All ASC files are in the WGS 1984 Mercator map projection for visualization in ArcGIS and can be opened as text files in text editors supporting large file sizes.
keywords: human dispersal; ecological niche modeling; Australasia; Late Pleistocene; land preference
published: 2020-05-31
This repository includes a simulated dataset and related scripts used for the paper "Moss: Accurate Single-Nucleotide Variant Calling from Multiple Bulk DNA Tumor Samples".
keywords: Somatic Mutations; Bulk DNA Sequencing; Cancer Genomics
published: 2020-05-30
Original leaf gas exchange and absorptance data used in the Collison et al. (2020) Light, Not Age, Underlies the Q9 Maladaptation of Maize and Miscanthus Photosynthesis to Self-Shading - Frontiers in Plant Science doi: 10.3389/fpls.2020.00783
keywords: C4 photosynthesis; canopy; bioenergy; food security; quantum yield; shade acclimation; photosynthetic light-use efficiency; leaf aging
published: 2019-11-11
This repository includes scripts and datasets for the paper, "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Note: The results from estimating species trees with ASTRID-multi (included in this repository) are *not* included in the FastMulRFS paper. We estimated species trees with ASTRID-multi in the fall of 2019, but ASTRID-multi had an important bug fix in January 2020. Therefore, the ASTRID-multi species trees in this repository should be ignored.
keywords: Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS
published: 2020-05-17
Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.
keywords: Social Media; Trolling; Aggression; Cyberbullying; text classification; natural language processing; deep learning; open source;
published: 2020-05-20
This dataset is a snapshot of the presence and structure of entrepreneurship education in U.S. four-year colleges and universities in 2015, including co-curricular activities and related infrastructure. Public, private not-for-profit and for-profit institutions are included, as are specialized four-year institutions. The dataset provides insight into the presence of entrepreneurship education both within business units and in other units of college campuses. Entrepreneurship is defined broadly, to include small business management and related career-focused options.
keywords: Entrepreneurship education; Small business education; Ewing Marion Kauffman Foundation; csv
published: 2020-05-15
Trained models for multi-task multi-dataset learning for sequence prediction in tweets Tasks include POS, NER, Chunking, and SuperSenseTagging Models were trained using: https://github.com/napsternxg/SocialMediaIE/blob/master/experiments/multitask_multidataset_experiment.py See https://github.com/napsternxg/SocialMediaIE for details.
keywords: twitter; deep learning; machine learning; trained models; multi-task learning; multi-dataset learning;
published: 2020-05-15
This data has tweets collected in paper Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science (WebSci '14). ACM, New York, NY, USA, 261-262. DOI: https://doi.org/10.1145/2615569.2615667 The data only contains tweet IDs and the corresponding enthusiasm and support labels by two different annotators.
keywords: Twitter; text classification; enthusiasm; support; social causes; LGBT; Cyberbullying; NFL
published: 2020-05-13
Terrorism is among the most pressing challenges to democratic governance around the world. The Responsible Terrorism Coverage (or ResTeCo) project aims to address a fundamental dilemma facing 21st century societies: how to give citizens the information they need without giving terrorists the kind of attention they want. The ResTeCo hopes to inform best practices by using extreme-scale text analytic methods to extract information from more than 70 years of terrorism-related media coverage from around the world and across 5 languages. Our goal is to expand the available data on media responses to terrorism and enable the development of empirically-validated models for socially responsible, effective news organizations. This particular dataset contains information extracted from terrorism-related stories in the New York Times published between 1945 and 2018. It includes variables that measure the relative share of terrorism-related topics, the valence and intensity of emotional language, as well as the people, places, and organizations mentioned. This dataset contains 3 files: 1. <i>"ResTeCo Project NYT Dataset Variable Descriptions.pdf"</i> <ul> <li>A detailed codebook containing a summary of the Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset and descriptions of all variables. </li> </ul> 2. <i>"resteco-nyt.csv"</i> <ul><li>This file contains the data extracted from terrorism-related media coverage in the New York Times between 1945 and 2018. It includes variables that measure the relative share of topics, sentiment, and emotion present in this coverage. There are also variables that contain metadata and list the people, places, and organizations mentioned in these articles. There are 53 variables and 438,373 observations. The variable "id" uniquely identifies each observation. Each observation represents a single news article. </li> <li> <b>Please note</b> that care should be taken when using "respect-nyt.csv". The file may not be suitable to use in a spreadsheet program like Excel as some of the values get to be quite large. Excel cannot handle some of these large values, which may cause the data to appear corrupted within the software. It is encouraged that a user of this data use a statistical package such as Stata, R, or Python to ensure the structure and quality of the data remains preserved.</li> </ul> 3. <i>"README.md"</i> <ul><li>This file contains useful information for the user about the dataset. It is a text file written in mark down language</li> </ul> <b>Citation Guidelines</b> 1) To cite this codebook please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times (NYT) Dataset Variable Descriptions. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1 2) To cite the data please use the following citation: Althaus, Scott, Joseph Bajjalieh, Marc Jungblut, Dan Shalmon, Subhankar Ghosh, and Pradnyesh Joshi. 2020. Responsible Terrorism Coverage (ResTeCo) Project New York Times Dataset. Cline Center for Advanced Social Research. May 13. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-4638196_V1
keywords: Terrorism, Text Analytics, News Coverage, Topic Modeling, Sentiment Analysis
published: 2020-05-12
The data provided herein is accelerometer and strain data taken from free vibration response of pre-tensioned, partially submerged steel beam specimens (modulus of elasticity assumed = 29,000 ksi). The specimens were subjected to various levels of pre-tension, and various levels of submersion in water. The purpose of the testing was to quantify the effects of partial submersion on the vibrating frequencies of pretensioned beams. Three specimens were tested, each with different cross section (but identical cross-sectional area). The different cross sections allow investigation of the effects of specimen width as the specimen vibrates through water. The testing procedure was as follows: 1) Apply a specified level of tension in the beam. Measure tension via 3 strain gages. 2) Submerge the specimens to a specified depth of water 3) Excite the beams with either a hammer impact or a pull-and-release method (physically pull the middle of the bar and quickly release) 4) Measure the free vibration of the beam with 2 accelerometers. Schematic drawings of the test setup and the test specimens are provided, as is a picture of the test setup.
keywords: free vibration; beam; partially-submerged; prestressed;
published: 2020-05-11
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. <b>Additional Resources:</b> - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, you can fill out the <a href="https://forms.gle/oaUWRSSCkqKxyY5T7"><b>Archer User Information Form</b></a>. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to <a href="https://groups.webservices.illinois.edu/subscribe/123172"><b>subscribe to Archer Users Group</b></a>. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [codebook]. Champaign, IL: University of Illinois. doi:10.13012/B2IDB-5649852_V1 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [database]. Champaign, IL: University of Illinois. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V1
keywords: Cline Center; Cline Center for Advanced Social Research; political; social; political science; Global News Index; Archer; news; mass communication; journalism;