Displaying datasets 1 - 25 of 251 in total

Subject Area

Life Sciences (251)
Social Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Uncategorized (0)
Arts and Humanities (0)


Other (83)
U.S. National Science Foundation (NSF) (70)
U.S. Department of Energy (DOE) (25)
U.S. Department of Agriculture (USDA) (22)
U.S. National Institutes of Health (NIH) (20)
Illinois Department of Natural Resources (IDNR) (9)
U.S. National Aeronautics and Space Administration (NASA) (2)
U.S. Geological Survey (USGS) (2)
U.S. Army (1)
Illinois Department of Transportation (IDOT) (0)

Publication Year

2021 (66)
2020 (60)
2019 (42)
2022 (27)
2018 (23)
2017 (19)
2016 (12)
2023 (2)


CC0 (167)
CC BY (75)
custom (9)
published: 2022-05-20
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords: aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology
published: 2022-02-08
Matlab codes for the article "Phage-antibiotic synergy inhibited by temperate and chronic virus competition". Code can be used to reproduce the article figures, perform the parameter sensitivity analysis and simulate the model.
keywords: bacterium-phage-antibiotic model; ODEs; Matlab; sensitivity analysis
published: 2022-03-09
MATLAB files for the analysis of an ODE model for disease transmission. The codes may be used to find equilibrium points, study transient dynamics, evaluate the basic reproductive number (R0), and simulate the model when parameters depend on the independent variables. In addition, the codes may be used to perform local sensitivity analysis of R0 on the model parameters.
published: 2022-03-31
This dataset contains our bi-hourly temperature recordings from 40 rocket box style artificial roosts of 5 designs deployed in Indiana and Kentucky, USA from April through September 2019. This dataset also includes our endothermic and faculatively heterothermic daily energy expenditure datasets used in our bioenergetic analysis, which were calculated from the bi-hourly rocket box temperature data. Lastly, we include our overheating counts dataset which summarizes daily overheating events (i.e., temperatures > 40 Celsius) in each rocket box style bat box over the course of the study period, these daily summaries were also calculated from the bi-hourly rocket box temperature recordings.
keywords: artificial roost; bat box; microcllimate; temperature
published: 2022-05-13
The files are plain text and contain the original data used in phylogenetic analyses of of Typhlocybinae (Bin, Dietrich, Yu, Meng, Dai and Yang 2022: Ecology & Evolution, in press). The three files with extension .phy are text files with aligned DNA sequences in the standard PHYLIP format and correspond to Matrix 1 (amino acid alignment), Matrix 2 (nucleotide alignment of first two codon positions of protein-coding genes) and Matrix 3 (nucleotide alignment of protein-coding genes plus 2 ribosomal genes) described in the Methods section. An additional text file in NEXUS format (.nex extension) contains the morphological character data used in the ancestral state reconstruction (ASCR) analysis described in the Methods. NEXUS is a standard format used by various phylogenetic analysis software. For more information on data file content, see the included "readme" files.
keywords: Hemiptera; phylogeny; mitochondrial genome; morphology; leafhopper
published: 2022-06-10
This dataset contains nucleotide sequences of 16S rRNA gene from phytoplasmas and other bacteria detected in phloem-feeding insects (Hemiptera, Auchenorrhyncha). The datasets were used to compare traditional Sanger sequencing with a next-generation sequencing method, Anchored Hybrid Enrichment (AHE) for detecting and characterizing phytoplasmas in insect DNA samples. The file “Trivellone_etal_SangerSequencing.fas”, comprising 1397 positions (the longest sequence), includes 35 not aligned bacterial 16S rRNA sequences (16 phytoplasmas and 19 other bacterial strains) yielded using Sanger sequencing. The file “Trivellone_etal_AHEmethod1.fas” includes 34 not aligned bacterial 16S rRNA sequences (28 phytoplasmas and 6 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled using assembled based on ABySS v2.1.0 pipeline. The file “Trivellone_etal_AHEmethod2.fas” includes 31 not aligned bacterial 16S rRNA sequences (27 phytoplasmas and 4 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled based on the HybPiper v2.0.1 pipeline . Additional details in the "read_me_trivellone.txt" file attached below.
keywords: anchored hybrid enrichment; biodiversity, biorepository; nested PCR; Sanger sequencing
published: 2021-04-16
This dataset includes five files developed using the procedures described in the article 'Developing County-level Data of Nitrogen Fertilizer and Manure Inputs for Corn Production in the United States' and Supplemental Information published in the Journal of Cleaner Production in 2021. Citation: Xia, Yushu, Hoyoung Kwon, and Michelle Wander. "Developing county-level data of nitrogen fertilizer and manure inputs for corn production in the United States." Journal of Cleaner Production 309 (2021): e126957. Brief method: The fertilizer and manure inputs for corn were generated with a top-down approach by assigning county-level total N inputs reported by USGS to different crops using state- and county-level survey data. The corn N needs were estimated using empirical extension-based equations coupled with soil and environmental covariates. The estimates of fertilizer N inputs were further refined for corn grain and silage production at the county level and gap-filling (using state-level averages) was carried out to generate final files for U.S. county-level N inputs. The dataset is provided in an alternative format in Google Earth Engine: https://code.earthengine.google.com/13a0078e7ee727bc001e045ad0e8c6fc
keywords: Corn; Nitrogen Fertilizer; Manure; Conterminous U.S.
has sharing link
planned publication date: 2023-06-01
An online knowledge, attitudes, and practices survey on ticks and tick-borne diseases was distributed to medical professionals in Illinois during summer 2020 to fall 2021. These are the raw data associated with that survey and the survey questions used. Age, gender, and county of practice have been removed for identifiability. We have added calculated values (columns 165 to end), including: the tick knowledge score, TBD knowledge score, and total knowledge score, which are the sum of the total number of correct answers in each category, and score percent, which are the proportion of correct answers in each category; region, which is determined from the county of practice; TBD relevant practice, which separates the practice variable into TBD primary, secondary, and non-responders; and several variables which group categories.
keywords: ticks; medicine; tick-borne disease; survey
published: 2022-06-01
This dataset contain information for the paper "Changes in neuropeptide prohormone genes among Cetartio-dactyla livestock and wild species associated with evolution and domestication" Veterinary Sciences, MDPI. Protein sequences were predicted using GeneWise for 98 neuropeptide prohormone genes from publicly available genomes of 118 Cetartiodactyla species. All predictions (CetartiodactylaSequences2022.zip) were manually verified. Sequences were aligned within each prohormone using MAFFT (MDPImultalign2022.zip includes multiple sequence alignment of all species available for each prohormone). Phylogenetic gene trees were constructed using PhyML and the species tree was constructed using ASTRAL (MDPItree2022.zip). The data is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
keywords: prohormone; neuropeptide; Cetartiodactyla; Cetartiodactyla; phylogenetics; gene tree; species tree
published: 2022-05-16
This dataset is for the publication "Do Nearctic hover flies (Diptera: Syrphidae) engage in long-distance migration? An assessment of evidence and mechanisms." It consists of 11 Excel spreadsheets and 4 R scripts which correspond to the analyses which were conducted. Paper abstract: Long-distance insect migration is poorly understood despite its tremendous ecological and economic importance. As a group, Nearctic hover flies (Diptera: Syrphidae: Syrphinae), which are crucial pollinators as adults and biological control agents as larvae, are almost entirely unrecognized as migratory despite examples of highly migratory behavior among several Palearctic species. Here, we examined evidence and mechanisms of migration for four hover fly species (Allograpta obliqua, Eupeodes americanus, Syrphus rectus, and Syrphus ribesii) common throughout eastern North America using stable hydrogen isotope (δ2H) measurements of chitinous tissue, morphological assessments, abundance estimations, and cold-tolerance assays. While further studies are needed, non-local isotopic values obtained from hover fly specimens collected in central Illinois support the existence of long-distance fall migratory behavior in Eu. americanus, and to a lesser extent S. ribesii and S. rectus. Elevated abundance of Eu. americanus during the expected autumn migratory period further supports the existence of such behavior. Moreover, high phenotypic plasticity of morphology associated with dispersal coupled with significant differences between local and non-local specimens suggest that Eu. americanus exhibits a unique suite of morphological traits that decrease costs associated with long-distance flight. Finally, compared to the ostensibly non-migratory A. obliqua, Eu. americanus was less cold tolerant, a factor that may be associated with migratory behavior. Collectively, our findings imply that fall migration occurs in Nearctic hover flies, but we consider methodological limitations of our study in addition to potential ecological and economic consequences of these novel findings.
keywords: Insect migration; hover fly; Syrphidae; stable isotopes; deuterium; morphometrics; cold tolerance
published: 2022-05-04
This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Refining the role of nitrogen mineralization in mycorrhizal nutrient syndromes". Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords: Nitrogen cycling; Ectomycorrhizal fungi; Arbuscular mycorrhizal fungi; Nitrogen fertilization; Gross mineralization
published: 2022-04-26
ICoastalDB, which was developed using Microsoft structured query language (SQL) Server, consists of water quality and related data in the Illinois coastal zone that were collected by various organizations. The information in the dataset includes, but is not limited to, sample data type, method of data sampling, location, time and date of sampling and data units.
keywords: Illinois Coastal Zone; Water Quality Data
published: 2022-04-20
This is the core data for Zinnen et al., "Functional traits and responses to nutrient and mycorrhizal addition are inconsistently related to wetland plant species’ coefficients of conservatism." This is submitted to Wetlands Ecology and Management. Two datasets are submitted here. The first is greenhouse-collected data of 9 plant traits and concurrent treatment responses of Illinois wetland plant species. The second are field-collected leaf trait data of Illinois wetland plant species. These data are analyzed in the paper. Please refer to the main manuscript to see how these data were produced and specific analyses.
keywords: ecological indicators; Floristic Quality Assessment; Floristic Quality Index; wetland degradation
published: 2022-04-19
List of differentially expressed genes in human endometrial stromal cells with knockdown of Basigin (BSG) gene expression during decidualization. The BSG siRNA or negative scrambled control siRNA were transfected into human endometrial stromal cells (HESCs) following the protocol of siLentFect™ Lipid (Bio-Rad, Hercules, CA. Following complete knock down of BSG in HESCs (72 hours after adding siRNA), HESCs were treated with medium containing estrogen, progesterone and cAMP to induce decidualization. BSG siRNA and negative control scrambled siRNA were added to the cells every four days (day 0, 4) over the course of the decidualization protocol. Total RNA was harvested at day 6 of the decidualization protocol for microarray analysis. Microarray analysis was performed at the University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center. Briefly, 0.2 micrograms of total RNA were labeled using the Agilent two color QuickAmp labeling kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s protocol. The optional spike-in controls were not used. Samples were hybridized to Human Gene Expression 4x44K v2 Microarray (Agilent Technologies, Santa Clara, CA) in an Agilent Hybridization Cassette according to standard protocols. The arrays were then scanned on an Axon GenePix 4000B scanner and the images were quantified using Axon GenePix 6.1. Microarray data pre-processing and statistical analyses were done in R (v3.6.2) using the limma package (3.42.0 (Ritchie et al., 2015). Median foreground and median background values from the 4 arrays were read into R and any spots that had been manually flagged (-100 values) were given a weight of zero. The background values were ignored because investigations showed that trying to use them to adjust for background fluorescence added more noise to the data; background was low and even for all arrays, therefore no background correction was done. The individual Cy5 and Cy3 fluorescence for each array were normalized together using the quantile method 3 (Yang and Thorne, 2003). Agilent's Human Gene Expression 4x44K v2 Microarray has a total of 45,220 probes: 1224 probes for positive controls, 153 negative control, 823 labeled “ignore” and 43,118 labeled “cDNA”. The pos+neg+ignore probes were used to ascertain the background level of fluorescence (6, on the log2 scale) then discarded. The cDNA probes comprise 34,127 unique 60mer probes, of which 999 probes are spotted 10 times each and the rest one time each. We averaged the replicate probes for those spotted 10 times and then fit a mixed model that had treatment and dye as fixed effects and array pairing as a random effect (Phipson et al., 2016; Smyth et al., 2005). After fitting the model but before False Discovery Rate (FDR) correction (Benjamini and Hochberg, 1995), probes were filtered out by the following criteria: 1) did not have at least 4/8 samples with expression values > 6 (14,105 probes removed), 2) no longer had an assigned Entrez Gene ID in Bioconductor’s HsAgilentDesign026652.db annotation package (v3.2.3; 2,152 probes removed) (Huber et al., 2015), 3) mapped to the same Entrez Gene ID as another probe but had a larger p-value for treatment effect (4,141 probes removed). This left 13,729 probes representing 13,729 unique genes. <b>*Please note: that there is a discrepancy between the file and the readme as this plain text is the actual data file of this dataset.</b>
keywords: Basigin; endometrium; decidualization; human
published: 2022-03-30
This dataset is associated with a larger manuscript published in 2022 in the Illinois Natural History Survey Bulletin to summarize all known records for nonindigenous aquatic mollusks in Illinois, and full sources are referenced within the manuscript. We examined museum holdings, literature accounts, publicly available databases sponsored by the U.S. Geological Survey (USGS) - Nonindigenous Aquatic Species program (http://nas.er.usgs.gov/.) and InvertEBase (invertebase.org). We also included sporadic field survey data of encounters of nonindigenous aquatic species from colleagues within the Illinois Natural History Survey, Illinois Department of Natural Resources, U.S. Fish and Wildlife Service, county forest preserve districts, and other natural resource agencies about their encounters with nonindigenous aquatic mollusk species. Lastly, we examined the role and utility of citizen-science data to document occurrences of nonindigenous aquatic mollusk species. We queried iNaturalist (www.inaturalist.org) for all available nonindigenous freshwater mollusk data for Illinois. Table heading descriptions (if not intuitive) are: “INHS verified” is whether an INHS staff member verified the record by observing vouchered specimen or photograph; “Source” is where a record was accessed or obtained; “individualCount” is number collected or observed in a record; “MuseumCode” is standard museum abbreviation or acronym; “Institution” is source that housed or reported a record, and this also includes the spelled-out museum code; “Collectors” typically indicates who collected the specimen or voucher; “Lat_Long determined by” denotes whether collection coordinates were stated by the collector or by a curator (using inference from data available); “fieldNumber” typically indicates a unique field number that a collector may have used in the field; “identifiedBy” typically explains who identified a specimen or verified a specimen identification.
keywords: Illinois; Exotic species; Non-native aquatic species; NAS; Aquatic Invasive Species; AIS; Mollusk
planned publication date: 2023-01-01
The following files were used to reconstruct the phylogeny of the leafhopper subfamily Typhlocybinae, using IQ-TREE v1.6.12 and ASTRAL v 4.10.5. <b>1) Taxon_sampling.csv:</b> contains the sample IDs (1st column) and the taxonomic information (2nd column). Sample IDs were used in the alignment files and partition files. <b>2) concatenated_nt_complete.phy:</b> a complete concatenated nucleotide dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 154,992 nucleotide positions (intron included) from 665 loci. Hyphens are used to represent gaps. <b>3) concatenated_nt_complete_partition.nex:</b> the partitioning schemes for concatenated_nt_complete.phy. The file partitions the 154,992 nucleotide characters into 426 character sets, and defines the best substitution model for each character set. <b>4) concatenated_cds_complete.phy:</b> a complete concatenated coding DNA sequence dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 153,525 nucleotide positions (intron excluded) from 665 loci. Hyphens are used to represent gaps. <b>5) concatenated_cds_complete_partition.nex:</b> the partitioning schemes for concatenated_cds_complete.phy. The file partitions the 153,525 nucleotide characters into 426 character sets, and defines the best substitution model for each character set. <b>6) concatenated_nt_reduced.phy:</b> a reduced concatenated nucleotide dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12. The file lists the sequences of 248 samples with 95,076 nucleotide positions (intron included) from 374 loci. Hyphens are used to represent gaps. <b>7) concatenated_nt_reduced_partition.nex:</b> the partitioning schemes for concatenated_nt_reduced.phy. The file partitions the 95,076 nucleotide characters into 312 character sets, and defines the best substitution model for each character set. <b>8) concatenated_aa_complete.phy:</b> a complete concatenated amino acid dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12, corresponding to concatenated_cds_complete.phy. The file lists the sequences of 248 samples with 51,175 amino acid positions from 665 loci. Hyphens are used to represent gaps. <b>9) concatenated_aa_complete_partition.nex:</b> the partitioning schemes for concatenated_aa_complete.phy. The file partitions the 51,175 amino acid characters into 426 character sets, and defines the best substitution model for each character set. <b>10) concatenated_aa_reduced.phy:</b> a reduced concatenated amino acid dataset used for the maximum likelihood analysis by IQ-TREE v1.6.12, corresponding to concatenated_nt_reduced.phy. The file lists the sequences of 248 samples with 31,384 amino acid positions from 374 loci. Hyphens are used to represent gaps. <b>11) concatenated_aa_reduced_partition.nex:</b> the partitioning schemes for concatenated_aa_reduced.phy. The file partitions the 31,384 amino acid characters into 312 character sets, and defines the best substitution model for each character set. <b>12) Individual_gene_alignment.zip:</b> contains 426 FASTA files, each one is an alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5 based the consensus trees with a minimum average bootstrap value of 70.
keywords: Auchenorrhyncha, Cicadomorpha, Membracoidea, anchored hybrid enrichment
published: 2022-03-25
This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp). In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH. More details of the tar.gz file are described in README.txt.
keywords: MAGUS;UPP;Multiple Sequence Alignment;eHMMs
published: 2022-03-25
The data are original electron micrographs from the lab of the late Dr. Burt Endo of the USDA. These data were digitized from photographic prints and glass plate negatives at 600 DPI as 16 bit TIFF files. This third version added 12 new ZIP files from the Endo data collection. "Endo folder database.xlsx" is updated to reflect the addition. Information in "ReadmeFile name formatting.docx" remains the same as in V2.
keywords: nematode; ultrastructure; Endo; electron microscopy
published: 2022-03-19
Raw arthroscopic scores, histologic scores, cytokine measurements, and performance data for the study cohort described in the accompanying publication.
keywords: horse; metatarsophalangeal joint; arthroscopy; exercise; developmental orthopedic disease
published: 2022-03-11
Data sets relating to the manuscript “Long-term yields in annual and perennial bioenergy crops in the Midwestern USA” published in Global Change Biology Bioenergy. Field data, including annual peak biomass and harvest yields from maize/soy, miscanthus, switchgrass, and prairie field trials from 2008-2018 are included. Peak and harvest biomass for fertilized and unfertilized miscanthus are included from 2014-2018.
keywords: miscanthus; switchgrass; yield; drought; crop; perennial; bioenergy
published: 2022-03-01
The following files were used to reconstruct the phylogeny of the leafhopper subfamily Deltocephalinae, using IQ-TREE v1.6.12 and ASTRAL v 4.10.5. <b>1) taxon_sampling.csv:</b> contains the sequencing ids (1st column) and the taxonomic information (2nd column) of each sample. Sequencing ids were used in the alignment files and partition files. <b>2)concatenated_nt.phy:</b> concatenated nucleotide alignment used for the maximum likelihood analysis of Deltocephalinae by IQ-TREE v1.6.12. The file lists the sequences of 163,365 nucleotide positions from 429 genes in 730 samples. Hyphens are used to represent gaps. <b>3) concatenated_nt_partition.nex:</b> the partitions for the concatenated nucleotide alignment. The file partitions the 163,365 nucleotide characters into 429 character sets, and defines the best substitution model for each character set. <b>4) concatenated_aa.phy:</b> concatenated amino acid alignment used for the maximum likelihood analysis of Deltocephalinae by IQ-TREE v1.6.12. The file gives the sequences of 53,969 amino acids from 429 genes in 730 samples. Hyphens are used to represent gaps. <b>5) concatenated_aa_partition.nex:</b> the partitions for the concatenated amino acid alignment. The file partitions the 53,969 characters into 429 character sets, and defines the best substitution model for each character set. <b>6) concatenated_nt_106taxa.phy:</b> a reduced concatenated nucleotide alignment representing 107 samples x 86 genes. This alignment is used to estimate the divergence times of Deltocephalinae using MCMCTree in PAML v4.9. The file lists the sequences of 79,239 nucleotide positions from 86 genes in 107 samples. Hyphens are used to represent gaps. <b>7) concatenated_nt_106taxa_partition.nex:</b> the partitions for the nucleotide alignment concatenated_nt_106taxa.phy. The file partitions the 79,239 nucleotide characters into 86 character sets, and defines the best substitution model for each character set. <b>8) individual_gene_alignment.zip:</b> contains 429 FAS files, one for each of the partitioned nucleotide character sets in the concatenated_nt_partition.nex file. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5.
published: 2022-02-14
Dataset associated with Allen et al. (In Review): Food caching by a solitary large carnivore supports optimal foraging theory If using this dataset, please cite this manuscript.
published: 2022-02-11
The Culex_Trivellone_etal.fas fasta file contains the original final sequence alignment used in the haplotype analyses of Trivellone et al. (Frontiers in Public Health, under review). The 492 sequences (from specimens of Culex pipiens complex collected in different habitat types using a BG-sentinel traps) were aligned using PASTA v1.8.5 under default settings. The final dataset contains 686 positions of the cytochrome c oxidase subunit I (COI) mitochondrial gene. The data analyses are further described in the cited original paper.
keywords: Culex; Culicidae; COI; mosquito surveillance, species assemblages
published: 2022-02-11
Upon treatment removal, spontaneous and random reactivation of latently infected T cells remains a major barrier toward curing HIV. Due to its stochastic nature, fluctuations in gene expression (or “noise”) can bias HIV reactivation from latency, and conventional drug screens for mean gene expression neglect compounds that modulate noise. Here we present a time-lapse fluorescence microscopy image set obtained from a Jurkat T-cell line, infected with a minimal HIV gene circuit, treated with 1,806 small molecule compounds, and imaged for 48 hours. In addition, the single-cell time-dependent reporter dynamics (single-cell gene expression intensity and noise trajectories) extracted from the image dataset are included. Based on this dataset, a total of 5 latency promoting agents of HIV was found through further experimentation in Lu et al., PNAS 2021 (doi: 10.1073/pnas.2012191118). For a detailed description of the dataset, please refer to the readme file.
keywords: HIV; latency; drug screen; fluorescence microscopy; time-lapse; microscopy; single-cell data; noise; gene expression fluctuation;