Displaying datasets 26 - 50 of 254 in total

Subject Area

Life Sciences (254)
Social Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Uncategorized (0)
Arts and Humanities (0)


Other (83)
U.S. National Science Foundation (NSF) (69)
U.S. Department of Energy (DOE) (25)
U.S. Department of Agriculture (USDA) (22)
U.S. National Institutes of Health (NIH) (21)
Illinois Department of Natural Resources (IDNR) (9)
U.S. Geological Survey (USGS) (3)
U.S. National Aeronautics and Space Administration (NASA) (2)
U.S. Army (1)
Illinois Department of Transportation (IDOT) (0)

Publication Year

2021 (66)
2020 (60)
2019 (42)
2022 (31)
2018 (23)
2017 (19)
2016 (12)
2023 (1)


CC0 (168)
CC BY (77)
custom (9)
published: 2022-02-14
Dataset associated with Allen et al. (In Review): Food caching by a solitary large carnivore supports optimal foraging theory If using this dataset, please cite this manuscript.
published: 2022-02-11
The Culex_Trivellone_etal.fas fasta file contains the original final sequence alignment used in the haplotype analyses of Trivellone et al. (Frontiers in Public Health, under review). The 492 sequences (from specimens of Culex pipiens complex collected in different habitat types using a BG-sentinel traps) were aligned using PASTA v1.8.5 under default settings. The final dataset contains 686 positions of the cytochrome c oxidase subunit I (COI) mitochondrial gene. The data analyses are further described in the cited original paper.
keywords: Culex; Culicidae; COI; mosquito surveillance, species assemblages
published: 2022-02-11
Upon treatment removal, spontaneous and random reactivation of latently infected T cells remains a major barrier toward curing HIV. Due to its stochastic nature, fluctuations in gene expression (or “noise”) can bias HIV reactivation from latency, and conventional drug screens for mean gene expression neglect compounds that modulate noise. Here we present a time-lapse fluorescence microscopy image set obtained from a Jurkat T-cell line, infected with a minimal HIV gene circuit, treated with 1,806 small molecule compounds, and imaged for 48 hours. In addition, the single-cell time-dependent reporter dynamics (single-cell gene expression intensity and noise trajectories) extracted from the image dataset are included. Based on this dataset, a total of 5 latency promoting agents of HIV was found through further experimentation in Lu et al., PNAS 2021 (doi: 10.1073/pnas.2012191118). For a detailed description of the dataset, please refer to the readme file.
keywords: HIV; latency; drug screen; fluorescence microscopy; time-lapse; microscopy; single-cell data; noise; gene expression fluctuation;
published: 2022-02-10
The compiled datasets include plot level observations of energy crops (miscanthus and switchgrass) from recent experimental field trials in the US including dry biomass yield, location, state, region, harvest year, growing season degree days (GDD), winter season heating degree days (HDD), growing season cumulative precipitation, annual nitrogen application rate, age of the pant when harvested, National Commodity Crop Productivity Index (NCCPI) values, and cultivar type (switchgrass) from various published and unpublished sources. The stata codes include estimation procedures for four different specifications, i.e., Model A includes deterministic effect without interaction terms; Model B includes deterministic effect with interaction terms (N2, age2, N × age, GDD2, precip2, N × NCCPI); Model C includes deterministic effect with interaction terms, study, and location random effect; Model D includes deterministic effect with interaction terms, harvest year augmented study, and location random effect.
keywords: Age; Miscanthus; Nitrogen; Switchgrass; Yield; Center for Advanced Bioenergy and Bioproducts Innovation
published: 2021-11-19
This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.
keywords: ROSE; simulation; fragmentary
published: 2022-01-30
This dataset contains temperature measurements in four different bat box designs deployed in central Indiana, USA from May to September 2018. Hourly environmental data (temperature, solar radiation, and wind speed) are also included for days and hours sampled. Bat box temperature data were used as inputs in a free program, GNU Octave, to assess design performance with respect to suitability indices for endothermic metabolism and pup development. Scripts are included in the dataset.
keywords: bats;thermal refuge;reproduction;conservation;bat box;microclimate
published: 2022-01-27
Twenty-two genotypes of C4 species grown under ambient and elevated O3 concentration were studied at the SoyFACE (40°02’N, 88°14’W) in 2019. This dataset contains leaf morphology, photosynthesis and nutrient contents measured at three time points. The results of CO2 response curves are also included.
keywords: C4, O3, photosynthesis
published: 2022-01-01
The file “Fla.fasta”, comprising 10526 positions, is the concatenated amino acid alignments of 51 orthologues of 182 bacterial strains. It was used for the maximum likelihood and maximum parsimony analyses of Flavobacteriales. Bacterial species names and strains were used as the sequence names, host names of insect endosymbionts were shown in brackets. The file “16S.fasta” is the alignment of 233 bacterial 16S rRNA sequences. It contains 1455 positions and was used for the maximum likelihood analysis of flavobacterial insect endosymbionts. The names of endosymbiont strains were replaced by the name of their hosts. In addition to the species names, National Center for Biotechnology Information (NCBI) accession numbers were also indicated in the sequence names (e.g., sequence “Cicadellidae_Deltocephalinae_Macrostelini_Macrosteles_striifrons_AB795320” is the 16S rRNA of Macrosteles striifrons (Cicadellidae: Deltocephalinae: Macrostelini) with a NCBI accession number AB795320). The file “Sulcia_pep.fasta” is the concatenated amino acid alignments of 131 orthologues of “Candidatus Sulcia muelleri” (Sulcia). It contains 41970 positions and presents 101 Sulcia strains and 3 Blattabacterium strains. This file was used for the maximum likelihood analysis of Sulcia. The file “Sulcia_nucleotide.fasta” is the concatenated nucleotide alignment corresponding to the sequences in “Sulcia_pep.fasta” but also comprises the alignment of 16S rRNA. It has 127339 positions and was used for the maximum likelihood and maximum parsimony analyses of Sulcia. Individual gene alignments (16S rRNA and 131 orthologues of Sulcia and Blattabacterium) are deposited in the compressed file “individual_gene_alignments.zip”, which were used to construct gene trees for multispecies coalescent analysis. The names of Sulcia strains were replaced by the name of their hosts in “Sulcia_pep.fasta”, “Sulcia_nucleotide.fasta” and the files in “individual_gene_alignments.zip”. In all the alignment files, gaps are indicated by “-”.
keywords: endosymbiont, “Candidatus Sulcia muelleri”, Auchenorrhyncha, coevolution
has sharing link
published: 2021-12-31
We developed and delivered in-person training at local health department offices in six of the seven Illinois Department of Public Health “health regions” between April-May of 2019. Pre-, post-, and six-month follow-up questionnaires on knowledge, attitudes, and practices with regards to tick surveillance were administered to training participants.
keywords: ticks; survey; tick-borne disease; public health
published: 2021-12-28
*Updates for this V3: added a few more records and rearranged the sequence of the tables in order to support our new paper "Evaluation of Indirect and Direct Scoring Methods to Relate Biochemical Soil Quality Indicators to Ecosystem Services" accepted by the Soil Science Society of America Journal. We summarize peer reviewed literature reporting associations between for three soil quality indicators (SQIs) (β-glucosidase (BG), fluorescein diacetate (FDA) hydrolysis, and permanganate oxidizable carbon (POXC)) and crop yield and greenhouse gas emissions. Peer-reviewed articles published between January of 1990 and May 2018 were searched using the Thomas Reuters Web of Science database (Thomas Reuters, Philadelphia, Pennsylvania) and Google Scholar to identify studies reporting results for: “β-glucosidase”, “permanganate oxidizable carbon”, “active carbon”, “readily oxidizable carbon”, or “fluorescein diacetate hydrolysis”, together with one or more of the following: “crop yield”, “productivity”, “greenhouse gas’, “CO2”, “CH4”, or “N2O”. Meta-data for records include the following descriptor variables and covariates useful for scoring function development: 1) identifying factors for the study site (location, duration of the experiment), 2) soil textural class, pH, and SOC, 3) depth of soil sampling, 4) units used in published works (i.e.: equivalent mass, concentration), 5) SQI abundances and measured ecosystem functions, and 6) summary statistics for correlation between SQIs and functions (yield and greenhouse gas emissions). *Note: Blank values in tables are considered unreported data.
keywords: Soil health promoting practices; Soil quality indicators; β-glucosidase; fluorescein diacetate hydrolysis; Permanganate oxidizable carbon; Greenhouse gas emissions; Scoring curves; Soil Management Assessment Framework
published: 2021-12-09
These data were collected in 2018 and 2019 at the University of Illinois Energy Farm (N 40.063607, W 88.206926). During each growing season, bulk and rhizosphere soil were collected from replicate Sorghum bicolor nitrogen use efficiency trial plots at three separate time points (approximately July 1, August 1, and September 1). We measured soil moisture, pH, soil nitrate and ammonium, potential nitrification, potential denitrification, and extracted and sequenced the V4 region of the 16S rRNA gene for microbial community analysis. All microbial sequence data is archived in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (accession number SRP326979, project number PRJNA741261).
keywords: soil nitrogen; nitrification; nitrogen cycle; sorghum; bioenergy; Center for Advanced Bioenergy and Bioproducts Innovation
has sharing link
published: 2021-12-01
An online knowledge, attitudes, and practices survey on ticks and tick-borne diseases was distributed to veterinary professionals in Southern and Central Illinois during summer and fall 2020. These are the raw data associated with that survey and the survey questions used. * NOTE: "age" and "gender" variables were removed from the data to protect participants.
keywords: ticks; veterinary medicine; tick-borne disease; survey
published: 2021-08-27
The dataset shows all poison frogs (superfamily Dendrobatoidea) in private U.S. collections during 1990–2020. For each species and color morph, there is a date of arrival, the way it arrived in U.S. collections, and detailed notes related to its presence in the pet trade.
keywords: pet trade; amphibians; Dendrobatidae
published: 2021-11-16
Data from an a field experiment at El Velo, Chiriqui, Republic of Panama. Data contain information about functional traits of seedlings growing in different treatments including type of forest, nitrogen addition and organic matter.
keywords: Mycorrhiza; nitrogen; oak forest; Panama; plant-soil feedbacks, seedling growth
published: 2021-10-27
Shared dataset consists of 16S sequencing data of microbial communities. Each community is composed of heterotrophic bacteria derived from one of two soil samples and the model algae Chlamydomonas reinhardtii. Each comunity was placed in a materially closed environment with an initial supply of carbon in the media and subjected to light-dark cycles. The closed microbial ecosystems (CES) survived via carbon cycling. Each CES was subjected to rounds of dilution, after which the community was sequenced (data provided here). The shared dataset allowed us to conclude that CES consistently self-assembled to cycle carbon (data not provided) via conserved metabolic capabilites (data not provided) dispite differences in taxonomic composition (data provided). --------------------------- Naming convention: [soil sample = A or B][CES replicate = 1,2,3, or 4]_[round number = 1,2,3,or 4]_[reverse read = R or forward read = F]_filt.fastq Example -- A1_r1_F_filt.fastq means soil sample A, CES replicate 1, end of round1, forward read
keywords: 16S seq; .fastq; closed microbial ecosystems; carbon cycling
planned publication date: 2022-09-29
Dataset associated with Merrill et al. ECE-2021-05-00793.R1 submission: Early life patterns of growth are linked to levels of phenotypic trait covariance and post-fledging mortality across avian species. Excel CSV files with all of the data used in analyses and file with descriptions of each column.
keywords: canalization; developmental flexibility; early-life stress; nest predation; phenotypic correlation; trait covariance
published: 2021-11-03
This dataset contains re-estimated gene trees from the ASTRAL-II [1] simulated datasets. The re-estimated variants of the datasets are called MC6H and MC11H -- they are derived from the MC6 and MC11 conditions from the original data (the MC6 and MC11 names are given by ASTRID [2]). The uploaded files contain the sequence alignments (half-length their original alignments), and the re-estimated species trees using FastTree2. Note: - "mc6h.tar.gz" and "mc11h.tar.gz" contain the sequence alignments and the re-estimated gene trees for the two conditions - the sequence alignments are in the format "all-genes.phylip.splitted.[i].half" where i means that this alignment is for the i-th alignment of the original dataset, but truncating the alignment halving its length - "g1000.trees" under each replicate contains the newline-separated re-estimated gene trees. The gene trees were estimated from the above described alignments using FastTree2 (version 2.1.11) command "FastTree -nt -gtr" [1]: Mirarab, S., & Warnow, T. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44-i52. [2]: Vachaspati, P., & Warnow, T. (2015). ASTRID: accurate species trees from internode distances. BMC genomics, 16(10), 1-13.
keywords: simulated data; ASTRAL; alignments; gene trees
published: 2021-10-22
This dataset includes the source data for Figures 1-4 and supplementary figures 1-10 for the manuscript "Kinetic and structural mechanism for DNA unwinding by a non-hexameric helicase".
published: 2021-10-28
Bigheaded carp were collected from the Illinois and Des Plaines Rivers, parts of the Illinois Waterway, from May to November 2018. A total of 93 fish were collected during sampling for a study comprised of 40 females, 41 males, and 12 unsexed fish. GC/MS metabolite profiling analysis detected 180 compounds. Livers from carp at the leading edge had differences in energy use and metabolism, and suppression of protective mechanisms relative to downstream fish; differences were consistent across time. This body of work provides evidence that water quality is linked to carp movement in the Illinois River. As water quality in this region continues to improve, consideration of this impact on carp spread is essential to protect the Great Lakes.
keywords: water quality; metabolites; range expansion; energy; contaminants
published: 2021-10-24
This dataset contains daily and hourly temperature measurements in twenty different bat box designs deployed in central Indiana, USA from May to September 2018. Daily and hourly environmental data (temperature, solar radiation, wind speed and direction) are also included for days and hours sampled. Bat box temperature data were reclassified to cool (</= 30°C), permissive (30.1–39.9°C), and stressful (>/= 40°C) categories according to known temperature tolerances of temperate-zone bats.
keywords: bat box; design; environmental variables; microclimate; temperature
has sharing link
published: 2021-10-15
Information on the location, dimensions, time of treefall or death, decay state, wood nutrient, wood pH and wood density data, and soil moisture, slope, distance from forest edge and soil nutrient data associated with the publication "Interspecific wood trait variation predicts decreased carbon residence time in changing forests" authored by Sierra Perez, Jennifer Fraterrigo, and James Dalling. ** <b>Note:</b> Blank cells indicate that no data were collected.
keywords: wood decay; carbon residence time; coarse woody debris; decomposition, temperate forests
published: 2021-10-15
This is the 5 states 5000 cells synthetic expression file we used for validation of SimiC, a single cell gene regulatory network inference method with similarity constraints. Ground truth GRNs are stored in Numpy array format, and expression profiles of all states combined are stored in Pandas DataFrame in format of Pickle files.
keywords: Numpy array; GRNs; Pandas DataFrame;
published: 2021-10-10
This data set describes temperature, dissolved oxygen, and secchi depth in 1-m interval profiles in the deepest point in 10 Illinois reservoirs between the years 1995 and 2016.
keywords: Water temperature; dissolved oxygen; secchi depth; climate change
published: 2021-09-17
We studied vegetation metric robustness to environmental (season, interannual, and regional) and methodological (observer) variables, as well as adequate sample size for vegetation metrics across four regions of the United States.
keywords: coefficients of conservatism; floristic quality assessment; restoration; vegetation metric;
published: 2021-09-03
All of the files in this dataset pertain to the evaluation of a novel statistic, Hind/He, for distinguishing Mendelian loci from paralogs. They are derived from a RAD-seq genotyping dataset of diploid and tetraploid Miscanthus sacchariflorus.