Displaying datasets 1 - 25 of 491 in total

Subject Area

Life Sciences (263)
Social Sciences (114)
Physical Sciences (69)
Technology and Engineering (39)
Uncategorized (5)
Arts and Humanities (1)


U.S. National Science Foundation (NSF) (142)
Other (135)
U.S. National Institutes of Health (NIH) (51)
U.S. Department of Energy (DOE) (46)
U.S. Department of Agriculture (USDA) (24)
Illinois Department of Natural Resources (IDNR) (11)
U.S. National Aeronautics and Space Administration (NASA) (5)
U.S. Geological Survey (USGS) (5)
Illinois Department of Transportation (IDOT) (1)
U.S. Army (1)

Publication Year

2021 (109)
2020 (96)
2022 (88)
2019 (72)
2018 (59)
2017 (35)
2016 (30)
2023 (2)


CC0 (286)
CC BY (192)
custom (13)
published: 2022-09-19
Data characterize zooplankton in Shelbyville Reservoir, Illinois, United States of America. Zooplankton were sampled with a conical zooplankton net (0.5m diameter mouth) when water was deeper than 2 m and by grab sample when water was shallower. Zooplankton samples were concentrated and subsampled with a Hensen-Stempel pipette following protocols described in Detmer et al. (2019). Zooplankton were identified to the lowest feasible taxonomic unit according to Pennak (1989) and Thorp and Covich (2001) and were enumerated in a 1 mL Sedgewick-Rafter cell. Subsamples were analyzed until at least 200 individuals were enumerated from each site.were counted across for each of the three main taxonomic groups (cladocerans, copepods, and rotifers). Given the variation in zooplankton concentrations at each site, this process often lead to far more than 200 individuals being counted (x̄ = 269, min = 200, max = 487). A summary of the sample size from each site can be found in Supplementary Table S2. Abundances were corrected for volume of water filtered. For rare taxa (< 20 individuals per sample), all individuals were measured for length. For abundant taxa, length measurements were collected on the first 20 organisms of each abundant taxon encountered in a subsample. Dry mass was calculated from equations for microcrustaceans, rotifers, and Chaoborus sp. (Rosen ,1981; Botrell et al., 1976; Dumont and Balvay, 1979).
keywords: Reservoir; Zooplankton
has sharing link
planned publication date: 2023-09-01
An online and paper knowledge, attitudes, and practices survey on ticks and tick-borne diseases (TBD) was distributed to farmers in Illinois during summer 2020 to spring 2022 (paper version titled Final Draft Farmer KAP_v.SoftCopy_Revised.docx). These are the raw data associated with that survey and the survey questions used (FarmerTickKAPdata.csv, data dictionary in Data Description.docx). We have added calculated values (columns 286 to end, code for calculation in FarmerKAPvariableCalculation.R), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category.
keywords: ticks; survey; tick-borne disease; farmer
published: 2022-09-16
This dataset contains model code (including input data) to replicate the outcomes for "Assessing the Efficiency Implications of Renewable Fuel Policy Design in the United States". The model consists of: (1) The replication codes and data for the model. To run the model, using GAMS to run the "Models.gms" file.
keywords: Renewable Fuel Standard; Nested structure; cellulosic waiver credit; RIN
published: 2022-09-14
Datasets that accompany Beilke and O'Keefe 2022 publication (Title: Bats reduce insect density and defoliation in temperate forests: an exclusion experiment; Journal: Ecology).
keywords: bats; defoliation; ecosystem services; forests, insectivory; insects; trophic cascades
published: 2022-09-08
Data associated with the manuscript "Overlooked invaders? Ecological impacts of non-game, native transplant fishes in the United States" by Jordan H. Hartman and Eric R. Larson
keywords: freshwater; non-game; native transplant; impacts; invasive species
published: 2022-09-07
We developed a new application of isotopic gas exchange which couples a tunable diode laser absorption spectroscope (TDL) with a leaf gas exchange system, analyzing leakiness through induction of C4 photosynthesis on dark to high-light transitions. The youngest fully expanded leaf was measured on 40-45 day-old maize(B73) and sorghum (Tx430). Detail definition of each variable in raw Li-6400XT and Li-6800 (in "Original_data_AND_Data_processing_code.zip") is summarized in: <a href="https://www.licor.com/env/support/LI-6800/topics/symbols.html#const">https://www.licor.com/env/support/LI-6800/topics/symbols.html#const</a>
keywords: leakiness; bundle sheath leakage; C4 photosynthesis; photosynthetic induction; non-steady-state photosynthesis; carbon isotope discrimination; photosynthetic efficiency; corn
published: 2022-09-07
The availability of economically marginal land for energy crops is identified using the Cropland Data Layer and other soil, wind, climate data resources. All data are recognized on a 30m spatial resolution across the continental United States.
keywords: marginal land; biofuel production; remote sensing; land use change; Cropland Data Layer
published: 2022-09-01
These data and code are associated with a study on differences in the rate of hatching failure of eggs across 14 free-living grassland and shrubland birds. We used a device to measure the embryonic heart rate of eggs and found there was variation across species related to factors such as nest type and nest safety. This work is to be published in Ornithology.
keywords: embryonic death; grassland birds; egg mortality; heart rate
published: 2022-08-31
These datasets are for the four-dimensional scanning transmission electron microscopy (4D-STEM) and electron energy loss spectroscopy (EELS) experiments for cathode nanoparticles at different cutoff voltages and in different electrolytes. The raw 4D-STEM experiment datasets were collected by TEM image & analysis software (FEI) and were saved as SER files. The raw 4D-STEM datasets of SER files can be opened and viewed in MATLAB using our analysis software package of imToolBox available at <a href="https://github.com/flysteven/imToolBox">https://github.com/flysteven/imToolBox</a>. The raw EELS datasets were collected by DigitalMicrograph software and were saved as DM4 files. The raw EELS datasets can be opened and viewed in DigitalMicrograph software or using our analysis codes available at <a href="https://github.com/chenlabUIUC/OrientedPhaseDomain">https://github.com/chenlabUIUC/OrientedPhaseDomain</a>. All the datasets are from the work "Formation and impact of nanoscopic oriented phase domains in electrochemical crystalline electrodes" (2022). The 4D-STEM experiment data include four example datasets for cathode nanoparticles collected at different cutoff voltages and in different electrolytes as described below. Each dataset contains a stack of diffraction patterns collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine cathode particle: "Pristine particle 4D-STEM.ser" 2. Cathode particle at the cutoff voltage of 0.09V during discharge at C/10 in the aqueous electrolyte: "Intermediate cutoff0_09V discharge (aqueous) 4D-STEM.ser" 3. Fully discharged cathode particle at C/10 in the aqueous electrolyte: "Fully discharged particle 4D-STEM.ser" 4. Fully discharged cathode particle at C/10 in the dry organic electrolyte: "Fully discharge particle (dry organic electrolyte).ser" The EELS experiment data includes three example datasets for cathode nanoparticles collected at different cutoff voltages during discharge in the aqueous electrolyte (in "EELS datasets.zip") as described below. Each EELS dataset contains the zero-loss and core-loss EELS spectra collected at different probe positions scanned across the cathode nanoparticle. 1. Pristine cathode particle: "Pristine particle EELS.zip" 2. Cathode particle at the cutoff voltage of 0.09V during discharge at C/10 in the aqueous electrolyte: "intermediate discharge (aqueous) EELS.zip" 3. Fully discharged cathode particle at C/10 in the aqueous electrolyte: "fully discharge (aqueous) EELS.zip" The details of the software package and codes that can be used to analyze the 4D-STEM datasets and EELS datasets are available at: https://github.com/chenlabUIUC/OrientedPhaseDomain. Once our paper is formally published, we will update the relationship of these datasets with our paper.
keywords: 4D-STEM; microstructure; phase transformation; strain; cathode; nanoparticle; energy storage
published: 2022-08-31
This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes". Please refer to that publication for details about methodologies used to generate these data and for the experimental design. For this verison 2, we added specific gross nitrogen mineralization rates (ugN/gOM/d), microbial biomass carbon (ugC/gdw), microbial biomass nitrogen (ugN/gdw) and microbial biomass C:N ratios to the newest version of the data set. Additionally, we updated values for gross nitrogen mineralization, microbial NO3 assimilation and microbial NH4 assimilation to reflect slight changes in data processing. Those changes are reflected in "220829_All data_repository.csv". "220829_nitrogen_mineralization_readme.txt " is updated readme for the new file. The other 2 files begin with “220426_” are older version and same as in V1.
keywords: Nitrogen cycling; Ectomycorrhizal fungi; Arbuscular mycorrhizal fungi; Nitrogen fertilization; Gross mineralization
published: 2022-08-25
Data in this publication were used to analyze the factors that influence the abundance of eastern whip-poor-wills in the Midwest and to describe the diet of this species. These data were collected in Illinois in 2019 and 2020. Procedures were approved by the Illinois Institutional Animal Care and Use Committee (IACUC), protocol no. 19006
keywords: eastern whip-poor-will; Antrostomus vociferus; abundance; moths; nightjars; Lepidoptera; metabarcoding
published: 2022-08-29
Example scripts and configuration files needed to perform select simulations described in the manuscript "Percolation transition prescribes protein size-specific barrier to passive transport through the nuclear pore complex."
keywords: Nuclear Pore Complex; simulation setup
published: 2022-08-22
This dataset contains Raman spectra, each acquired from an individual, living, primary murine cell belonging to one of the six most immature hematopoietic cell populations found in the body: hematopoietic stem cell (HSC), mutipotent progenitor 1 (MPP1), multipotent progenitor 2 (MPP2), multipotent progenitor 3 (MPP3), common lymphoid progenitor, common myeloid progenitor (CLP). These spectra are useful for identifying spectral signatures that are characteristic of each hematopoietic stem or early progenitor cell population. *NOTE: __MACOSX folder and files start with “._[file name]” found in "Raman spectra of single cells text files.zip" were created by the computer operation system, in unreadable format, which are not part of the data and can be removed/ignored when using the data.
keywords: Raman spectroscopy; single-cell spectrum; hematopoietic cell; hematopoietic stem cell; multipotent progenitor cell; common myeloid progenitor; common lymphoid progenitor
published: 2022-08-23
This dataset contains soil chemical properties used to variation in soil fungal communities beneath Oreomunnea mexicana trees in the manuscript "Watershed-scale variation in potential fungal community contributions to ectomycorrhizal biogeochemical syndromes"
keywords: Acid-base chemistry; Ectomycorrhizal fungi; Exploration type; Nitrogen cycling; Nitrogen isotopes; Plant-soil (below-ground) interactions; Saprotrophic fungi; Tropical forest
published: 2022-08-20
Dataset associated with Jones and Ward BEAS-D-21-00106R2 submission: Parasitic cowbird development up to fledging and subsequent post-fledging survival reflect life history variation found across host species. Excel CSV files and .inp file with data used in nest survival and Brown-headed Cowbird post-fledging analyses and file with descriptions of each column. The CSV file is setup for logistic exposure models in SAS or R and the .inp file is setup to be uploaded into program MARK for multi-state recaptures only analysis. Species included in the analyses: American Robin, Blue Grosbeak, Brown Thrasher, Blue-winged Warbler, Carolina Chickadee, Chipping Sparrow, Common Yellowthroat, Dickcissel, Eastern Bluebird, Eastern Phoebe, Eastern Towhee, Field Sparrow, Gray Catbird, House Wren, Indigo Bunting, Northern Cardinal, Red-winged Blackbird, Tree Swallow, Yellow-breasted Chat, and Yellow Warbler.
keywords: brood parasitism; cowbird; carryover effects; phenotypic plasticity; post-fledging; songbirds
published: 2022-01-10
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. <b>Additional Resources:</b> - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the <a href="https://docs.google.com/forms/d/e/1FAIpQLSf-J937V6I4sMSxQt7gR3SIbUASR26KXxqSurrkBvlF-CIQnQ/viewform?usp=pp_url"><b>Archer Access Request Form</b></a> so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the <a href="https://forms.gle/6eA2yJUGFMtj5swY7"><b>Archer User Feedback Form</b></a>. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this <a href="https://groups.webservices.illinois.edu/subscribe/123172"><b>form</b></a> to subscribe to it. <b>Citation Guidelines:</b> 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2022. Global News Index and Extracted Features Repository [codebook], v1.1.0. Champaign, IL: University of Illinois. Dec. 16. doi:10.13012/B2IDB-5649852_V3 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2022. Global News Index and Extracted Features Repository [database], v1.1.0. Champaign, IL: University of Illinois. Dec. 16. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V3
keywords: Cline Center; Cline Center for Advanced Social Research; political; social; political science; Global News Index; Archer; news; mass communication; journalism;
published: 2022-08-08
This upload contains all datasets used in Experiments 2 and 3 of the SALMA paper (pending submission): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. "SALMA: Scalable ALignment using MAFFT-Add". The zip file has the following structure (presented as an example): salma_paper_datasets/ |_README.md |_10aa/ |_crw/ |_homfam/ |_aat/ | |_... |_... |_het/ |_5000M2-het/ | |_... |_5000M3-het/ ... |_rec_res/ Generally, the structure can be viewed as: [category]/[dataset]/[replicate]/[alignment files] # Categories: 1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate. 2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM). 3. homfam: There are the 10 largest Homfam datasets, each with one replicate. 4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates. 5. rec\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper. # Alignment files There are at most 6 `.fasta` files in each sub-directory: 1. `all.unaln.fasta`: All unaligned sequences. 2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included. 3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences). 4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included. 5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences). 6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included. >If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing. >If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing. >If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing. # Additional file(s) 1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.
keywords: SALMA;MAFFT;alignment;eHMM;sequence length heterogeneity
published: 2022-03-25
This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp). In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH. More details of the tar.gz file are described in README.txt.
keywords: MAGUS;UPP;Multiple Sequence Alignment;eHMMs
published: 2022-08-06
This dataset consists of all the files and codes that are part of the manuscript (main text and supplement) titled "Spin-selective tunneling from nanowires of the candidate topological Kondo insulator SmB6". For detailed information on the individual files refer to the specific readme files.
keywords: Topology; Kondo Inuslator; Spin; Scanning tunneling microscopy; antiferromagnetism
has sharing link
published: 2022-08-06
An online knowledge, attitudes, and practices survey on ticks and tick-borne diseases was distributed to medical professionals in Illinois during summer 2020 to fall 2021. These are the raw data associated with that survey and the survey questions used. Age, gender, and county of practice have been removed for identifiability. We have added calculated values (columns 165 to end), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category; region, which is determined from the county of practice; TBD relevant practice, which separates the practice variable into TBD primary, secondary, and non-responders; and several variables which group categories.
keywords: ticks; medicine; tick-borne disease; survey
published: 2022-08-05
This data set documents bat activity (counts per detector-night per phonic group) and bat diversity (number of bat species per detector-night) in relation to distance to the nearest forested corridor in a row crop agriculture dominated landscape and in relation to relative crop pest abundance. This data set was used to assess if bats were homogeneously distributed over a near-uninterrupted agricultural landscape and to assess the importance of forested corridors and the presence of pest species on their distribution across the landscape. Data was collected with 50 AudioMoth bat detectors along 10 transects, with each transect having 5 detectors. The transects started at a forest corridor and extended out for 4 km into uninterrupted row crop agriculture. Pest abundance was extrapolated from data collected in the same county during the same time as the study. Potentially important weather covariates were extracted from the nearest operational weather station.
keywords: bats; bat activity; biodiversity; agricultural pest
published: 2022-08-05
Simulated sequences provide a way to evaluate multiple sequence alignment (MSA) methods where the ground truth is exactly known. However, the realism of such simulated conditions often comes under question compared to empirical datasets. In particular, simulated data often does not display heterogeneity in the sequence lengths, a common feature in biological datasets. In order to imitate sequence length heterogeneity, we here present a set of data that are evolved under a mixture model of indel lengths, where indels have an occasional chance of being promoted to long indels (emulating large insertion/deletion events, e.g., domain-level gain/loss). This dataset is otherwise (e.g., in GTR parameters) analogous to the 1000M condition as presented in the SATe paper (doi: 10.1126/science.1171243) but with 5000 sequences and simulated with INDELible (http://abacus.gene.ucl.ac.uk/software/indelible/). For more information, see README.txt. For the INDELible control files, see https://github.com/ThisBioLife/5000M-234-het.
keywords: simulated data; sequence length heterogeneity; multiple sequence alignment;
published: 2022-08-01
Datasets that accompany Shearer and Beilke 2022 publication (Title: Playing it by ear: gregarious sparrows recognize and respond to isolated wingbeat sounds and predator-based cues.; Journal: Animal Cognition)
keywords: Vigilance; auditory detection; predator detection; predator-prey interaction; antipredator behavior
published: 2022-07-25
Related to the raw entity mentions, this dataset represents the effects of the data cleaning process and collates all of the entity mentions which were too ambiguous to successfully link to the NCBI's taxonomy identifier system.
keywords: synthetic biology; NERC data; species mentions, ambiguous entities
published: 2022-07-25
A set of species entity mentions derived from an NERC dataset analyzing 900 synthetic biology articles published by the ACS. This data is associated with the Synthetic Biology Knowledge System repository (https://web.synbioks.org/). The data in this dataset are raw mentions from the NERC data.
keywords: synthetic biology; NERC data; species mentions