Displaying 101 - 125 of 349 in total

Subject Area

Life Sciences (349)
Social Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Arts and Humanities (0)
Uncategorized (0)


Other (127)
U.S. National Science Foundation (NSF) (92)
U.S. Department of Agriculture (USDA) (41)
U.S. Department of Energy (DOE) (39)
U.S. National Institutes of Health (NIH) (25)
Illinois Department of Natural Resources (IDNR) (17)
U.S. Geological Survey (USGS) (4)
Illinois Department of Transportation (IDOT) (3)
U.S. National Aeronautics and Space Administration (NASA) (2)
U.S. Army (2)

Publication Year

2021 (66)
2020 (60)
2022 (56)
2019 (42)
2023 (41)
2024 (26)
2018 (24)
2017 (19)
2016 (12)
2025 (3)
2009 (0)
2011 (0)
2012 (0)
2014 (0)
2015 (0)


CC0 (219)
CC BY (117)
custom (13)


published: 2022-07-08
Dataset for "Spatial drivers of wetland bird occupancy within an urbanized matrix in the Upper Midwestern United States" manuscript contains occupancy data for ten wetland bird species used in single-species occupancy models at four spatial scales and four wetland habitat types. Data were collected from 2017-2019 in NE Illinois and NW Indiana. Dataset includes wetland bird occupancy data, habitat parameter values for each survey location, and R code used to run analyses.
keywords: wetland birds; occupancy; emergent wetland; urbanization; Great Lakes region
published: 2022-08-22
This dataset contains Raman spectra, each acquired from an individual, living, primary murine cell belonging to one of the six most immature hematopoietic cell populations found in the body: hematopoietic stem cell (HSC), mutipotent progenitor 1 (MPP1), multipotent progenitor 2 (MPP2), multipotent progenitor 3 (MPP3), common lymphoid progenitor, common myeloid progenitor (CLP). These spectra are useful for identifying spectral signatures that are characteristic of each hematopoietic stem or early progenitor cell population. *NOTE: __MACOSX folder and files start with “._[file name]” found in "Raman spectra of single cells text files.zip" were created by the computer operation system, in unreadable format, which are not part of the data and can be removed/ignored when using the data.
keywords: Raman spectroscopy; single-cell spectrum; hematopoietic cell; hematopoietic stem cell; multipotent progenitor cell; common myeloid progenitor; common lymphoid progenitor
published: 2022-04-26
ICoastalDB, which was developed using Microsoft structured query language (SQL) Server, consists of water quality and related data in the Illinois coastal zone that were collected by various organizations. The information in the dataset includes, but is not limited to, sample data type, method of data sampling, location, time and date of sampling and data units.
keywords: Illinois Coastal Zone; Water Quality Data
published: 2021-10-24
This dataset contains daily and hourly temperature measurements in twenty different bat box designs deployed in central Indiana, USA from May to September 2018. Daily and hourly environmental data (temperature, solar radiation, wind speed and direction) are also included for days and hours sampled. Bat box temperature data were reclassified to cool (</= 30°C), permissive (30.1–39.9°C), and stressful (>/= 40°C) categories according to known temperature tolerances of temperate-zone bats.
keywords: bat box; design; environmental variables; microclimate; temperature
published: 2023-03-06
This dataset includes mass spectrometry, library screening, and gas chromatography data used for creating a high-throughput screening in metabolic engineering.
keywords: mass spectrometry; gas chromatography
published: 2019-12-10
The dataset consists of two types of data: the estimate of land productivity (the maximum productivity, MP) and the estimate of land that has low productivity for any major crops planted in the Contiguous United States and then may be available for growing bioenergy crops (the marginal land, ML). All data items are in GeoTiff format, under the World Geodetic System (WGS) 84 project, and with a resolution of 0.0020810045 degree (~250 m). The MP values are calculated based on machine learning model estimated yields of major crops in the CONUS, and its expected value (MP_mean.tif), and associated uncertainty (MP_IDP.tif). The ML availability data have two versions: a deterministic version and a version with uncertainty. The deterministic MLs are determined as the land pixels with expected MP values falling in the range defined in the following criteria, and the MLs with uncertainty are determined as the probability that the MP value of a land pixel falls in the range defined in the following criteria: Criteria_____Description S1________ Current crop and pasture land with MP <= P50 S2________ Current crop and pasture land with MP <= P25 S3________ S1 + current grass and shrub land with P25 < MP < P50 S4________ S2 + current grass and shrub land with P10 < MP < P25 Economic__ Current crop and pasture land with potential profitability < 0 Here P10, P25 and P50 are the 10th, 25th and 50th percentile of crop MP values
keywords: Land productivity;marginal land;land use
published: 2021-08-24
This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology. Each zipped file contains a README.
keywords: deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation
published: 2022-02-11
The Culex_Trivellone_etal.fas fasta file contains the original final sequence alignment used in the haplotype analyses of Trivellone et al. (Frontiers in Public Health, under review). The 492 sequences (from specimens of Culex pipiens complex collected in different habitat types using a BG-sentinel traps) were aligned using PASTA v1.8.5 under default settings. The final dataset contains 686 positions of the cytochrome c oxidase subunit I (COI) mitochondrial gene. The data analyses are further described in the cited original paper.
keywords: Culex; Culicidae; COI; mosquito surveillance, species assemblages
published: 2022-06-10
This dataset contains nucleotide sequences of 16S rRNA gene from phytoplasmas and other bacteria detected in phloem-feeding insects (Hemiptera, Auchenorrhyncha). The datasets were used to compare traditional Sanger sequencing with a next-generation sequencing method, Anchored Hybrid Enrichment (AHE) for detecting and characterizing phytoplasmas in insect DNA samples. The file “Trivellone_etal_SangerSequencing.fas”, comprising 1397 positions (the longest sequence), includes 35 not aligned bacterial 16S rRNA sequences (16 phytoplasmas and 19 other bacterial strains) yielded using Sanger sequencing. The file “Trivellone_etal_AHEmethod1.fas” includes 34 not aligned bacterial 16S rRNA sequences (28 phytoplasmas and 6 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled using assembled based on ABySS v2.1.0 pipeline. The file “Trivellone_etal_AHEmethod2.fas” includes 31 not aligned bacterial 16S rRNA sequences (27 phytoplasmas and 4 other bacterial strains) and it contains 1530 positions (the longest sequence). Each sequence was assembled based on the HybPiper v2.0.1 pipeline . Additional details in the "read_me_trivellone.txt" file attached below.
keywords: anchored hybrid enrichment; biodiversity, biorepository; nested PCR; Sanger sequencing
published: 2023-03-04
These data represent the raw data from the paper “Evaluating the ability of wetland mitigation banks to replace plant species lost from destroyed wetlands” published in Journal of Applied Ecology in 2023 by Stephen C. Tillman and Jeffrey W. Matthews.
published: 2022-02-08
Matlab codes for the article "Phage-antibiotic synergy inhibited by temperate and chronic virus competition". Code can be used to reproduce the article figures, perform the parameter sensitivity analysis and simulate the model.
keywords: bacterium-phage-antibiotic model; ODEs; Matlab; sensitivity analysis
published: 2023-04-19
Supplemental data sets for the Manuscript entitled " Assembly of wood-inhabiting archaeal, bacterial and fungal communities along a salinity gradient: common taxa are broadly distributed but locally abundant in preferred habitats"
keywords: wood decomposition; aquatic fungi; aquatic bacteria; aquatic archaea; microbial succession; microbial life-history
published: 2021-05-21
Data sets from "Inferring Species Trees from Gene-Family with Duplication and Loss using Multi-Copy Gene-Family Tree Decomposition." It contains trees and sequences simulated with gene duplication and loss under a variety of different conditions. <b>Note:</b> - trees.tar.gz contains the simulated gene-family trees used in our experiments (both true trees from SimPhy as well as trees estimated from alignements). - sequences.tar.gz contains simulated sequence data used for estimating the gene-family trees as well as the concatenation analysis. - biological.tar.gz contains the gene trees used as inputs for the experiments we ran on empirical data sets as well as species trees outputted by the methods we tested on those data sets. - stats.txt list statistics (such as AD, MGTE, and average size) for our simulated model conditions.
keywords: gene duplication and loss; species-tree inference; simulated data;
published: 2021-06-25
Data associated with the manuscript "Do rusty crayfish invasions affect water clarity in north temperate lakes?" by Daniel K. Szydlowski, Melissa K. Daniels, and Eric R. lARSON
keywords: chlorophyll a; crayfish; Faxonius rusticus; invasive species; lakes; LandSat; remote sening; rusty crayfish; Secchi disc; water clarity
published: 2021-06-24
This dataset consists of the secondary ion mass spectrometry (SIMS) depth profiling data that was collected with a Cameca NanoSIMS 50 instrument from a 10 micron by 10 micron region on a Madin-Darby canine kidney (MDCK) cell that had been metabolically labeled so most of its sphingolipids and cholesterol contained the rare nitrogen-15 oxygen-18 isotopes, respectively.
keywords: secondary ion mass spectrometry; NanoSIMS; depth profiling; MDCK cell; sphingolipids; cholesterol
published: 2019-11-18
VCF files used to analyze a novel filtering tool VEF, presented in the article "VEF: a Variant Filtering tool based on Ensemble methods".
keywords: VCF files; filtering; VEF
published: 2021-07-10
This dataset containes the images of B73xMS71 RIL population used in QTL linkage mapping for maize epidermal traits in year 2016 and 2017. 2016RIL_all_mns.rar and 2017RIL_all_mns.rar: contain raw images produced by Nanofocus lsurf Explorer Optical Topometer (Oberhausen, Germany) at 20X magnification with 0.6 numerical aperture. Files were processed in Nanofocus μsurf analysis extended software (Oberhausen,Germany). 2016RIL_all_TIF.rar and 2017RIL_all_TIF.rar: contain images processed from the Topology layer in each nms file to strengthen the edges of cell outlines, and used in downstream cell detection. 2016RIL_all_detection_result.rar and 2017RIL_all_detection_result.rar: contain images with epidermal cells predicted using the Mask R-CNN model. training data.rar: contain images used for Mask R-CNN model training and validation.
keywords: stomata; Mask R-CNN; cell segmentation; water use efficiency
published: 2022-07-19
#### Details of Pseudomonas aeruginosa biofilm dataset #### ----------------*Folder Structure*------------------------------------- This dataset contains peak intensity tables extracted from mass spectrometry imaging (MSI) data using tools, SCiLS and MSI reader. There are 2 folders in "MSI-Data-Paeruginosa-biofilms-UIUC-DP-JVS-July2022.zip", each folder contains 3 sub-folders as listed below. 1. PellicleBiofilms-and-Supernatant [Pellicle biofilms collected from air-liquid interface and spend supernatant medium after 96 h incubation period]: (1) Full-Scan-Data-96h; (2) MSMS-data-from-C7-Quinolones-96h; and (3) MSMS-data-from-C9-Quinolones-96h 2. StaticBiofilms [Static biofilms grown on mucin surface]: (1) Full-Scan-Data; (2) MSMS-data-from-C7-Quinolones; and (3) MSMS-data-from-C9-Quinolones ----------------*File name*---------------------------------------------- Sample information is included in the file names for easy identification and processing. Attributes covered in file names are explained in the example below. *Example file name "Rep1-Stat-FRD1-mPat-48-FS"* ~ Each unit of information is separated by "-" ~Unit 1 - "Rep1" - Biological replicate ( Rep1, Rep2, and Rep3) ~Unit 2 - "Stat" - Sample type (Stat = Static Biofilm, Pel = Pellicle biofilm, Sup = Supernatant) ~Unit 3 - "FRD1" - Strain (FRD1 = Mucoid strain, PAO1C = Non-mucoid strain) ~Unit 4 - "mPat" - Type of mucin surface used (mPat = patterned mucin surface, mUni = uniform mucin surface) ~Unit 5 - "48" - Sample time point (hours = 48, 72, 96) ~Unit 6 - "FS" - Scan type used in MSI (FS = high resolution full-scan, 260 = targeted MS/MS of C7 quinolones (m/z 260), 288 = targeted MS/MS of C9 quinolones (m/z 288)) ----------------*File structure*------------------------------------------ All MSI data has been exported to CSV format. Each CSV files contains information about scan number, Coordinates (x,y,z), m/z values, extraction window (absolute), and corresponding intensities in the form of a matrix. ----------------*End of Information*--------------------------------------
keywords: mass spectrometry imaging (MSI); biofilm; antibiotic resistance; Pseudomonas aeruginosa; quorum sensing; rhamnolipids
published: 2021-11-03
This dataset contains re-estimated gene trees from the ASTRAL-II [1] simulated datasets. The re-estimated variants of the datasets are called MC6H and MC11H -- they are derived from the MC6 and MC11 conditions from the original data (the MC6 and MC11 names are given by ASTRID [2]). The uploaded files contain the sequence alignments (half-length their original alignments), and the re-estimated species trees using FastTree2. Note: - "mc6h.tar.gz" and "mc11h.tar.gz" contain the sequence alignments and the re-estimated gene trees for the two conditions - the sequence alignments are in the format "all-genes.phylip.splitted.[i].half" where i means that this alignment is for the i-th alignment of the original dataset, but truncating the alignment halving its length - "g1000.trees" under each replicate contains the newline-separated re-estimated gene trees. The gene trees were estimated from the above described alignments using FastTree2 (version 2.1.11) command "FastTree -nt -gtr" [1]: Mirarab, S., & Warnow, T. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44-i52. [2]: Vachaspati, P., & Warnow, T. (2015). ASTRID: accurate species trees from internode distances. BMC genomics, 16(10), 1-13.
keywords: simulated data; ASTRAL; alignments; gene trees
published: 2020-07-15
This repository includes scripts and datasets for Chapter 6 of my PhD dissertation, " Supertree-like methods for genome-scale species tree estimation," that had not been published previously. This chapter is based on the article: Molloy, E.K. and Warnow, T. "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Bioinformatics, In press. https://doi.org/10.1093/bioinformatics/btaa444. The results presented in my PhD dissertation differ from those in the Bioinformatics article, because I re-estimated species trees using FastMulRF and MulRF on the same datasets in the original repository (https://doi.org/10.13012/B2IDB-5721322_V1). To re-estimate species trees, (1) a seed was specified when running MulRF, and (2) a different script (specifically preprocess_multrees_v3.py from https://github.com/ekmolloy/fastmulrfs/releases/tag/v1.2.0) was used for preprocessing gene trees (which were then given as input to MulRF and FastMulRFS). Note that this preprocessing script is a re-implementation of the original algorithm for improved speed (a bug fix also was implemented). Finally, it was brought to my attention that the simulation in the Bioinformatics article differs from prior studies, because I scaled the species tree by 10 generations per year (instead of 0.9 years per generation, which is ~1.1 generations per year). I re-simulated datasets (true-trees-with-one-gen-per-year-psize-10000000.tar.gz and true-trees-with-one-gen-per-year-psize-50000000.tar.gz) using 0.9 years per generation to quantify the impact of this parameter change (see my PhD dissertation or the supplementary materials of Bioinformatics article for discussion).
keywords: Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS
published: 2021-02-26
These data were used in the survival and cause-specific mortality analyses of translocated nuisance American black bear in Wisconsin published in Animal Conservation (Bauder, J.M., N.M. Roberts, D. Ruid, B. Kohn, and M.L. Allen. Accepted. Lower survival of nuisance American black bears (Ursus americanus) is not due to translocation. Animal Conservation). Included are CSV files including each bear's capture history and associated covariates and meta-data for each CSV file. Also included is an example R script of how to conduct the analyses (this R script is also included as supporting information with the published paper).
keywords: black bear; survival; translocation; nuisance wildlife management
published: 2021-03-08
These are abundance dynamics data and simulations for the paper "Higher-order interaction between species inhibits bacterial invasion of a phototroph-predator microbial community". In this V2, data were converted in Python, in addition to MATLAB and more information on how to work with the data was included in the Readme.
keywords: Microbial community; Higher order interaction; Invasion; Algae; Bacteria; Ciliate
published: 2021-03-10
The PhytoplasmasRef_Trivellone_etal.fas fasta file contains the original final sequence alignment used in the phylogenetic analyses of Trivellone et al. (Ecology and Evolution, in review). The 27 sequences (21 phytoplasma reference strains and 6 phytoplasmas strains from the present study) were aligned using the Muscle algorithm as implemented in MEGA 7.0 with default settings. The final dataset contains 952 positions of the F2n/R2 fragment of the 16S rRNA gene. The data analyses are further described in the cited original paper.
keywords: Hemiptera; Cicadellidae; Mollicutes; Phytoplasma; biorepository
published: 2022-02-10
The compiled datasets include plot level observations of energy crops (miscanthus and switchgrass) from recent experimental field trials in the US including dry biomass yield, location, state, region, harvest year, growing season degree days (GDD), winter season heating degree days (HDD), growing season cumulative precipitation, annual nitrogen application rate, age of the pant when harvested, National Commodity Crop Productivity Index (NCCPI) values, and cultivar type (switchgrass) from various published and unpublished sources. The stata codes include estimation procedures for four different specifications, i.e., Model A includes deterministic effect without interaction terms; Model B includes deterministic effect with interaction terms (N2, age2, N × age, GDD2, precip2, N × NCCPI); Model C includes deterministic effect with interaction terms, study, and location random effect; Model D includes deterministic effect with interaction terms, harvest year augmented study, and location random effect.
keywords: Age; Miscanthus; Nitrogen; Switchgrass; Yield; Center for Advanced Bioenergy and Bioproducts Innovation