Displaying datasets 101 - 125 of 353 in total

Subject Area

Life Sciences (186)
Social Sciences (82)
Physical Sciences (45)
Technology and Engineering (30)
Uncategorized (9)
Arts and Humanities (1)


Other (93)
U.S. National Science Foundation (NSF) (92)
U.S. National Institutes of Health (NIH) (38)
U.S. Department of Energy (DOE) (31)
U.S. Department of Agriculture (USDA) (17)
Illinois Department of Natural Resources (IDNR) (9)
U.S. National Aeronautics and Space Administration (NASA) (4)
U.S. Geological Survey (USGS) (3)
U.S. Army (1)

Publication Year

2020 (102)
2019 (73)
2018 (59)
2021 (53)
2017 (35)
2016 (30)
2022 (1)


CC0 (201)
CC BY (145)
custom (7)
published: 2020-10-13
Data in this spreadsheet presents basic information on Cahokia, Mound 72 shell artifacts. This includes taxonomic identifications, provenience, and bead measurements. There are five tabs: 1. Raw data; 2. Disk bead measurements; 3. Columella bead measurements; 4. Data on cups and pendants; and, 5. Information on whole shell beads.
keywords: Cahokia; Mound 72; Lightning whelk; Bead crafting
published: 2020-10-01
Raw gas exchange data for photosynthetic induction in 6 rice accession flag leaves. Photosynthetic induction and point measurements were made at ambient [CO2]. Two accessions (AUS 278 and IR64) were selected to screen in greater detail in which photosynthetic induction was measured at six [CO2].
published: 2020-09-25
This repository contains the datasets and corresponding results for the paper "MAGUS: Multiple Sequence Alignment using Graph Clustering". The Datasets.zip archive contains the ROSE, balibase, Gutell, and RNASim datasets used in our experiments. The Results.zip archive contains the outputs of running our methods against these datasets. Datasets used: ROSE: 10 simulated nucleotide model conditions from the SATe paper, each with 20 replicates, and with 1000 sequences per replicate. The ROSE datasets were originally taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i">https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i</a> RNASim: This is a collection of simulated nucleotide datasets that were generated under a model of evolution that reflects selection due to RNA structural constraints. We sampled 20 subsets of 1000 sequences each, as well as 10 subsets of 10000 each, by randomly sampling from the original million-sequence RNASim dataset. Gutell: 16S.M, 16S.3, 16S.T, 16S.B.ALL: Four biological nucleotide datasets from the Comparative Ribosomal Website (CRW) with cleaned reference alignments from SATe. Since PASTA is restricted to datasets without sequence length heterogeneity, these were modified to remove sequences that deviate by more than 20% from the median length. The scrubbed datasets range from 740 to 24,246 sequences. The pre-screened 16S datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s">https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s</a> BAliBASE: We use eight BAliBASE amino acid datasets used in the PASTA paper. As above, we remove outlier sequences, which leaves us with sizes ranging from 195 to 732 sequences. The pre-screened Balibase datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp">https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp</a>
published: 2020-09-27
This dataset contains R codes used to produce the figures submitted in the manuscript titled "Understanding the multifaceted geospatial software ecosystem: a survey approach". The raw survey data used to populate these charts cannot be shared due to the survey consent agreement.
keywords: R; figures; geospatial software
published: 2020-09-18
Restriction site-associated DNA sequencing (RAD-seq) data from 643 Miscanthus accessions from a diversity panel, including 613 Miscanthus sacchariflorus, three M. sinensis, and 27 M. xgiganteus. DNA was digested with PstI and MspI, and single-end Illumina sequencing was performed adjacent to the PstI site. Variant and genotype calling was performed with TASSEL-GBSv2, using the Miscanthus sinensis v7.1 reference genome from Phytozome 12 (https://phytozome.jgi.doe.gov). Additional ploidy-aware genotype calling was performed by polyRAD v1.1.
keywords: variant call format (VCF); genotyping-by-sequencing (GBS); single nucleotide polymorphism (SNP); grass; genetic diversity; biomass
published: 2020-09-17
Data are from a long-term fire manipulation experiment in the Missouri Ozarks, USA. Data include the raw, annual ring-width increment (rwl), basal area increment (BAI), population-level annual growth resistance (Drs) and resilience (Drl) to drought, intrinsic water use efficiency values (WUEi) and oxygen isotopic composition of individual radial growth rings (δ18O) from southern red oak (Quercus falcata) and post oak (Q. stellata) trees. ---------------------- TITLE: Data for "Sixty-five years of fire manipulation reveals climate and fire interact to determine growth rates of Quercus spp." ---------------------- FILE OVERVIEW: This dataset contains four (4) CSV files as described below: Refsland_et_al_ECS20-0465_BAI.csv: annual basal area increment between 1948-2015 for trees across the fire manipulation experiment Refsland_et_al_ECS20-0465_DroughtIndices.csv: population-level drought resistance and resilience of trees during each target drought period Refsland_et_al_ECS20-0465_WUEi.csv: carbon isotope indicators of drought stress for trees across the fire manipulation experiment Refsland_et_al_ECS20-0465_d18Or.csv: oxygen isotope indicators of drought stress for trees across the fire manipulation experiment ---------------------- VARIABLE EXPLANATION: All the variables in those four files are explained as below: treeID: unique character string that identifies subject tree block: integer (1, 2) that identifies the study block plot: integer (1-12) that identifies the plot nested within each study block trt: character string (Annual, Control, Periodic) that identifies the fire treatment of a given plot species: character string (Quercus falcata, Quercus stellata) that identifies species of subject tree year: integer (1948-2015) that identifies the dated year of each tree ring rwl_mm: numerical value representing the annual tree ring-width, in mm bai_cm2: numerical value representing the annual basal area increment, in cm2 timeperiod: integer value (1953, 1964, 2007, 2012) representing the periods encompassing target dry and wet years Drs_2yr: numerical value representing the drought resistance, defined as the population-level annual growth of trees during drought years relative to pre-drought years for a given time period Drl_2yr: numerical value representing the drought resilience, defined as the population-level annual growth of trees following drought years relative to pre-drought years for a given time period stand_ba_m2ha: numerical value representing the total basal area of a given plot, in m2 per ha stand_density_stems_ha: numerical value representing the total stem density of a given plot, in stems per ha pool: numerical value (1-40) identifying the set of tree ring samples pooled for analysis. Samples were pooled by block, plot, year and species period: integer value (1953, 1964, 1980, 2007, 2012) representing the periods encompassing target dry and wet years type: character string (Dry, Wet) indicating the water availability of a given year d13C: numerical value representing the carbon isotopic composition of radial growth rings within a given sample pool, in per mil WUEi: numerical value representing the annual intrinsic water use efficiency of radial growth rings within a given sample pool d18O: numerical value representing the oxygen isotopic composition of radial growth rings within a given sample pool, in per mil
keywords: climate change adaptation; drought; fire; nitrogen availability; oak-hickory; radial growth; resilience; resistance; stand density; temperate broadleaf forest; water stress
published: 2020-09-07
This dataset contains BEPAM model code and input data to the replicate the results for "Assessing the Returns to Land and Greenhouse Gas Savings from Producing Energy Crops on Conservation Reserve Program Land." The dataset consists of: (1) The replication codes and data for the BEPAM model. The code file is named as output_0213-2020_Complete_daycent-agversion-[rental payment level]%_[biomass price].gms. (BEPAM-CRP model-Sep2020.zip) (2) Simulation results from the BEPAM model (BEPAM_Simulation_Results.csv) * Item (1) is in GAMS format. Item (2) is in text format.
keywords: Miscanthus; Switchgrass; soil carbon sequestration; greenhouse gas savings; rental payments; biomass price
published: 2020-09-02
Citation context annotation. This dataset is a second version (V2) and part of the supplemental data for Jodi Schneider, Di Ye, Alison Hill, and Ashley Whitehorn. (2020) "Continued post-retraction citation of a fraudulent clinical trial report, eleven years after it was retracted for falsifying data". Scientometrics. In press, DOI: 10.1007/s11192-020-03631-1 Publications were selected by examining all citations to the retracted paper Matsuyama 2005, and selecting the 35 citing papers, published 2010 to 2019, which do not mention the retraction, but which mention the methods or results of the retracted paper (called "specific" in Ye, Di; Hill, Alison; Whitehorn (Fulton), Ashley; Schneider, Jodi (2020): Citation context annotation for new and newly found citations (2006-2019) to retracted paper Matsuyama 2005. University of Illinois at Urbana-Champaign. <a href="https://doi.org/10.13012/B2IDB-8150563_V1">https://doi.org/10.13012/B2IDB-8150563_V1</a> ). The annotated citations are second-generation citations to the retracted paper Matsuyama 2005 (RETRACTED: Matsuyama W, Mitsuyama H, Watanabe M, Oonakahara KI, Higashimoto I, Osame M, Arimura K. Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD. Chest. 2005 Dec 1;128(6):3817-27.), retracted in 2008 (Retraction in: Chest (2008) 134:4 (893) <a href="https://doi.org/10.1016/S0012-3692(08)60339-6">https://doi.org/10.1016/S0012-3692(08)60339-6<a/> ). <b>OVERALL DATA for VERSION 2 (V2)</b> FILES/FILE FORMATS Same data in two formats: 2010-2019 SG to specific not mentioned FG.csv - Unicode CSV (preservation format only) - same as in V1 2010-2019 SG to specific not mentioned FG.xlsx - Excel workbook (preferred format) - same as in V1 Additional files in V2: 2G-possible-misinformation-analyzed.csv - Unicode CSV (preservation format only) 2G-possible-misinformation-analyzed.xlsx - Excel workbook (preferred format) <b>ABBREVIATIONS: </b> 2G - Refers to the second-generation of Matsuyama FG - Refers to the direct citation of Matsuyama (the one the second-generation item cites) <b>COLUMN HEADER EXPLANATIONS </b> File name: 2G-possible-misinformation-analyzed. Other column headers in this file have same meaning as explained in V1. The following are additional header explanations: Quote Number - The order of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Quote - The text of the quote (citation context citing the first generation article given in "FG in bibliography") in the second generation article (given in "2G article") Translated Quote - English translation of "Quote", automatically translation from Google Scholar Seriousness/Risk - Our assessment of the risk of misinformation and its seriousness 2G topic - Our assessment of the topic of the cited article (the second generation article given in "2G article") 2G section - The section of the citing article (the second generation article given in "2G article") in which the cited article(the first generation article given in "FG in bibliography") was found FG in bib type - The type of article (e.g., review article), referring to the cited article (the first generation article given in "FG in bibliography") FG in bib topic - Our assessment of the topic of the cited article (the first generation article given in "FG in bibliography") FG in bib section - The section of the cited article (the first generation article given in "FG in bibliography") in which the Matsuyama retracted paper was cited
keywords: citation context annotation; retraction; diffusion of retraction; second-generation citation context analysis
published: 2020-08-21
# WikiCSSH If you are using WikiCSSH please cite the following: > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. “WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia.” In Workshop on Scientific Knowledge Graphs (SKG 2020). https://skg.kmi.open.ac.uk/SKG2020/papers/HAN_et_al_SKG_2020.pdf > Han, Kanyao; Yang, Pingjing; Mishra, Shubhanshu; Diesner, Jana. 2020. "WikiCSSH - Computer Science Subject Headings from Wikipedia". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0424970_V1 Download the WikiCSSH files from: https://doi.org/10.13012/B2IDB-0424970_V1 More details about the WikiCSSH project can be found at: https://github.com/uiuc-ischool-scanr/WikiCSSH This folder contains the following files: WikiCSSH_categories.csv - Categories in WikiCSSH WikiCSSH_category_links.csv - Links between categories in WikiCSSH Wikicssh_core_categories.csv - Core categories as mentioned in the paper WikiCSSH_category_links_all.csv - Links between categories in WikiCSSH (includes a dummy category called <ROOT> which is parent of isolates and top level categories) WikiCSSH_category2page.csv - Links between Wikipedia pages and Wikipedia Categories in WikiCSSH WikiCSSH_page2redirect.csv - Links between Wikipedia pages and Wikipedia page redirects in WikiCSSH This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit <a href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</a> or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
keywords: wikipedia; computer science;
published: 2020-08-19
This data set is a matrix of values. The element in the row "i" and the column "j" denotes the influence of hexagonal pyramidal distribution at node "i" on the node "j". The size of the matrix is 16641x16641. This matrix corresponds to a 129x129 grid. Influence coefficient matrix on a smaller grid can be obtained by appropriately choosing the elements from the bigger matrix.
keywords: Influence coefficients
published: 2020-08-18
These data and code enable replication of the findings and robustness checks in "No buzz for bees: Media coverage of pollinator decline," published in Proceedings of the National Academy of Sciences of the United States of America (2020)". In this paper, we find that although widespread declines in insect biomass and diversity are increasing concern within the scientific community, it remains unclear whether attention to pollinator declines has also increased within information sources serving the general public. Examining patterns of journalistic attention to the pollinator population crisis can also inform efforts to raise awareness about the importance of declines of insect species providing ecosystem services beyond pollination. We used the Global News Index developed by the Cline Center for Advanced Social Research at the University of Illinois at Urbana-Champaign to track news attention to pollinator topics in nearly 25 million news items published by two American national newspapers and four international wire services over the past four decades. We provide a link to documentation of the Global News Index in the "relationships with articles, code, o. We found vanishingly low levels of attention to pollinator population topics relative to coverage of climate change, which we use as a comparison topic. In the most recent subset of ~10 million stories published from 2007 to 2019, 1.39% (137,086 stories) refer to climate change/global warming, while only 0.02% (1,780) refer to pollinator populations in all contexts and just 0.007% (679) refer to pollinator declines. Substantial increases in news attention were detectable only in U.S. national newspapers. We also find that while climate change stories appear primarily in newspaper “front sections”, pollinator population stories remain largely marginalized in “science” and “back section” reports. At the same time, news reports about pollinator populations increasingly link the issue to climate change, which might ultimately help raise public awareness to effect needed policy changes.
keywords: News Coverage; Text Analytics; Insects; Pollinator; Cline Center; Cline Center for Advanced Social Research; political; social; political science; Global News Index; Archer; news; mass communication; journalism
published: 2020-08-10
These are text files downloaded from the Web of Science for the bibliographic analyses found in Zinnen et al. (2020) in Applied Vegetation Science. They represent the papers and reference lists from six expert-based indicator systems: Floristic Quality Assessment, hemeroby, naturalness indicator values (& social behaviors), Ellenberg indicator values, grassland utilization values, and urbanity indicator values. To examine data, download VOSviewer and see instructrions from van Eck & Waltman (2019) for how to upload data. Although we used bibliographic coupling, there are a number of other interesting bibliographic analyses you can use with these data (e.g., visualizing citations between journals from this set of documents). Note: There are two caveats to note about these data and Supplements 1 & 2 associated with our paper. First, there are some overlapping papers in these text files (i.e., raw data). When added individually, the papers sum to more than the numbers we give. However, when combined VOSviewer recognizes these as repeats, and matches the numbers we list in S1 and the manuscript. Second, we labelled the downloaded papers in S2 with their respective systems. In some cases, the labels do not completely match our counts listed in S1 and raw data. This is because some of these papers use another system, but were not captured in our systematic literature search (e.g., a paper may have used hemeroby, but was not picked up by WoS, so this paper is not listed as one of the 52 hemeroby papers).
keywords: Web of Science; bibliographic analyses; vegetation; VOSviewer
published: 2020-08-01
This data set includes information used to determine patterns of mixing at three small confluences in East Central Illinois based on differences in the temperature or turbidity of the two confluent flows.
keywords: mixing; confluences; flow structure
published: 2020-07-10
These are the data sets associated with our publication "Semi-natural wildflower-strip field borders provide winter refuge for pest natural enemies: a case study on organic farms." For this project, we compared the communities of overwintering arthropod natural enemies in organic cultivated fields and wildflower-strip field borders at five different sites in central Illinois. Abstract: Strips of wildflowers along field borders are frequently used in midwestern U.S. sustainable agriculture. These properties help diversify otherwise monocultural landscapes and provision them with ecosystem services, including biological control. Predatory and parasitic arthropods (i.e., natural enemies) often flourish in these habitats and will move into crops to help control pests. However, the capacity of wildflower strips for providing overwintering refuge for these arthropods is poorly understood. In this study, we used soil emergence tents to characterize natural enemy communities overwintering in cultivated organic crop fields and adjacent wildflower strip field borders. We found a greater abundance and species richness, and a unique community composition, of predatory and parasitic arthropods in wildflower strips compared to arable crop fields. These results demonstrate that semi-natural habitats such as wildflower strips can be important for maintaining natural enemies in agricultural landscapes.
keywords: Natural enemy; wildflower strips; conservation biological control; semi-natural habitat; field border; organic farming
published: 2020-07-16
Dataset to be for SocialMediaIE tutorial
keywords: social media; deep learning; natural language processing
published: 2020-07-15
This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."
keywords: Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL
published: 2020-06-30
This file contains 13 unique case studies that were created for the One health: Infectious diseases course offered at the University of Illinois at Urbana-Champaign campus. The case studies are being made available as educational resources for other One health courses. Each case study is focused on a theme/topic which is associated with One health. These case studies were created using publicly available information and references have been provided for each case study.
keywords: One health education; infectious diseases; case studies
published: 2020-06-26
This dataset contains the PartMC-MOSAIC simulations used in the article "Quantifying Errors in the Aerosol Mixing-State Index Based on Limited Particle Sample Size". The 1000 simulations of output data is organized into a series of archived folders, each containing 100 scenarios. Within each scenario directory are 25 NetCDF files, which are the hourly output of a PartMC-MOSAIC simulation containing all information regarding the environment, particle and gas state. This dataset was used to investigate the impact of sample size on determining aerosol mixing state. This data may be useful as a data set for applying different types of estimators.
keywords: Atmospheric aerosols; single-particle measurements; sampling uncertainty; NetCDF
published: 2020-02-12
This dataset contains the results of a three month audit of housing advertisements. It accompanies the 2020 ICWSM paper "Auditing Race and Gender Discrimination in Online Housing Markets". It covers data collected between Dec 7, 2018 and March 19, 2019. There are two json files in the dataset: The first contains a list of json objects representing advertisements separated by newlines. Each object includes the date and time it was collected, the image and title (if collected) of the ad, the page on which it was displayed, and the training treatment it received. The second file is a list of json objects representing a visit to a housing lister separated by newlines. Each object contains the url, training treatment applied, the location searched, and the metadata of the top sites scraped. This metadata includes location, price, and number of rooms. The dataset also includes the raw images of ads collected in order to code them by interest and targeting. These were captured by selenium and named using a perceptive hash to de-duplicate images.
keywords: algorithmic audit; advertisement audit;
published: 2020-02-12
This is the dataset used in the Landscape Ecology publication of the same name. This dataset consists of the following files: NWCA_Int_Veg.txt NWCA_Reg_Veg.txt NWCA_Site_Attributes.txt NWCA_Int_Veg.txt is a site and plot by species matrix. Column labeled SITES consists of site IDs. Column labeled Plots consist of Plot ID numbers. All other columns represent species abundances (estimates of percent cover, summed across five plots). NWCA_Reg_Veg.txt is a site by species matrix of species abundances. Column labeled SITES consist of site IDs. All other columns represent species abundances (estimates of percent cover within individual plots). NWCA_Site_Attributes.txt is a matrix of site attributes. Column labeled SITES consist of site IDs. Column labeled AA_CENTER_LAT consist of latitudinal coordinates for the Assessment Area center point in decimal degrees. Column labeled AA_CENTER_LONG consist of longitudinal coordinates for the Assessment Area center point in decimal degrees. Column REFPLUS_NWCA represents disturbance gradient classes including MIN (minimally disturbed), L (least disturbed), I (intermediate), M (most disturbed). Column REFPLUS_NWCA2 represents revised disturbance gradient classes based on protocols described in the article. These revised classes were used for analysis. Column labeled STRESS_HEAVYMETAL represents heavy metal stressor classes, used to ascertain which wetlands were missing soil data. Classes in the STRESS_HEAVYMETAL column include Low, Moderate, High, and Missing. Sites with Missing STRESS_HEAVYMETAL classes were removed from analysis. More information about this dataset: All of the data used in this analysis was gathered from the National Wetlands Condition Assessment. Wetland surveys were conducted from 4/4/2011 to 11/2/2011. The entire National Wetlands Condition Assessment Dataset, which includes 3640 unique taxonomic identities of plants, can be found at: https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys
keywords: Anthropogenic disturbance; β-Diversity; Biotic homogenization; Phalaris arundinacea; reed canary grass; Wetlands
published: 2020-06-06
These data are from an observational study and small experiment investigating reproductive biology and hybridization between two plants, Celastrus scandens L. and Celastrus orbiculatus Thunb. (Celastraceae). These data were collected during the 2008 growing season from the Indiana Dunes National Park (formerly Indiana Dunes National Lakeshore), just east of the municipality of Ogden Dunes, Indiana, USA. The five data files provide information on floral output of the two species, fertilization rate, fruit set rate, hybridization rate at two scales (individual flowers in both species, individual maternal plants in C. scandens), and the results of a hand-pollination experiment that exchanged pollen between the two species. There are six data files associated with this submission, five data files in comma-separated values format and one text file (‘readme.txt’) that includes detailed explanations of the data files.
keywords: Celastrus; invasive species; hybridization; heterospecific pollen; hand pollination