Displaying datasets 176 - 200 of 550 in total

Subject Area

Life Sciences (292)
Social Sciences (123)
Physical Sciences (78)
Technology and Engineering (49)
Uncategorized (7)
Arts and Humanities (1)

Funder

U.S. National Science Foundation (NSF) (164)
Other (159)
U.S. Department of Energy (DOE) (56)
U.S. National Institutes of Health (NIH) (53)
U.S. Department of Agriculture (USDA) (30)
Illinois Department of Natural Resources (IDNR) (12)
U.S. National Aeronautics and Space Administration (NASA) (5)
U.S. Geological Survey (USGS) (5)
Illinois Department of Transportation (IDOT) (3)
U.S. Army (2)

Publication Year

2022 (111)
2021 (108)
2020 (96)
2019 (72)
2018 (59)
2023 (39)
2017 (35)
2016 (30)

License

CC0 (314)
CC BY (220)
custom (16)
published: 2021-10-27
 
Shared dataset consists of 16S sequencing data of microbial communities. Each community is composed of heterotrophic bacteria derived from one of two soil samples and the model algae Chlamydomonas reinhardtii. Each comunity was placed in a materially closed environment with an initial supply of carbon in the media and subjected to light-dark cycles. The closed microbial ecosystems (CES) survived via carbon cycling. Each CES was subjected to rounds of dilution, after which the community was sequenced (data provided here). The shared dataset allowed us to conclude that CES consistently self-assembled to cycle carbon (data not provided) via conserved metabolic capabilites (data not provided) dispite differences in taxonomic composition (data provided). --------------------------- Naming convention: [soil sample = A or B][CES replicate = 1,2,3, or 4]_[round number = 1,2,3,or 4]_[reverse read = R or forward read = F]_filt.fastq Example -- A1_r1_F_filt.fastq means soil sample A, CES replicate 1, end of round1, forward read
keywords: 16S seq; .fastq; closed microbial ecosystems; carbon cycling
published: 2021-11-03
 
This dataset contains re-estimated gene trees from the ASTRAL-II [1] simulated datasets. The re-estimated variants of the datasets are called MC6H and MC11H -- they are derived from the MC6 and MC11 conditions from the original data (the MC6 and MC11 names are given by ASTRID [2]). The uploaded files contain the sequence alignments (half-length their original alignments), and the re-estimated species trees using FastTree2. Note: - "mc6h.tar.gz" and "mc11h.tar.gz" contain the sequence alignments and the re-estimated gene trees for the two conditions - the sequence alignments are in the format "all-genes.phylip.splitted.[i].half" where i means that this alignment is for the i-th alignment of the original dataset, but truncating the alignment halving its length - "g1000.trees" under each replicate contains the newline-separated re-estimated gene trees. The gene trees were estimated from the above described alignments using FastTree2 (version 2.1.11) command "FastTree -nt -gtr" [1]: Mirarab, S., & Warnow, T. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44-i52. [2]: Vachaspati, P., & Warnow, T. (2015). ASTRID: accurate species trees from internode distances. BMC genomics, 16(10), 1-13.
keywords: simulated data; ASTRAL; alignments; gene trees
published: 2021-11-05
 
This data set contains survey results from a 2021 survey of University of Illinois University Library employees conducted as part of the Becoming A Trans Inclusive Library Project to evaluate the awareness of University of Illinois faculty, staff, and student employees regarding transgender identities, and to assess the professional development needs of library employees to better serve trans and gender non-conforming patrons. The survey instrument is available in the IDEALS repository: http://hdl.handle.net/2142/110080.
keywords: transgender awareness, academic library, gender identity awareness, professional development opportunities
published: 2021-11-05
 
This data set contains survey results from a 2021 survey of University of Illinois University Library patrons who identify as transgender or gender non-conforming conducted as part of the Becoming a Trans Inclusive Library Project to assess the experiences of transgender patrons seeking information and services in the University Library. Survey instruments are available in the IDEALS repository: http://hdl.handle.net/2142/110081.
keywords: transgender awareness; academic library; gender identity awareness; patron experience
published: 2021-11-04
 
This dataset contains all the data for the results section in the study presented in the paper entitled "Chemistry Across Multiple Phases (CAMP) version 1.0: An integrated multi-phase chemistry mode" submitted to Geoscientific Model Development (GMD). In this paper, two sets of simulations were run to test CAMP with this results included here. This consists of (1) box model inputs and outputs presented in Section 4.2 for modal, binned and particle-resolved simulations to compare the application of identical chemical mechanisms to different aerosol representations and (2) the 3D Eulerian output presented in Section 4.3.
keywords: Atmospheric chemistry; Aerosols and particles; Numerical Modeling
published: 2021-10-22
 
This dataset includes the source data for Figures 1-4 and supplementary figures 1-10 for the manuscript "Kinetic and structural mechanism for DNA unwinding by a non-hexameric helicase".
published: 2021-10-28
 
Bigheaded carp were collected from the Illinois and Des Plaines Rivers, parts of the Illinois Waterway, from May to November 2018. A total of 93 fish were collected during sampling for a study comprised of 40 females, 41 males, and 12 unsexed fish. GC/MS metabolite profiling analysis detected 180 compounds. Livers from carp at the leading edge had differences in energy use and metabolism, and suppression of protective mechanisms relative to downstream fish; differences were consistent across time. This body of work provides evidence that water quality is linked to carp movement in the Illinois River. As water quality in this region continues to improve, consideration of this impact on carp spread is essential to protect the Great Lakes.
keywords: water quality; metabolites; range expansion; energy; contaminants
published: 2021-10-24
 
This dataset contains daily and hourly temperature measurements in twenty different bat box designs deployed in central Indiana, USA from May to September 2018. Daily and hourly environmental data (temperature, solar radiation, wind speed and direction) are also included for days and hours sampled. Bat box temperature data were reclassified to cool (</= 30°C), permissive (30.1–39.9°C), and stressful (>/= 40°C) categories according to known temperature tolerances of temperate-zone bats.
keywords: bat box; design; environmental variables; microclimate; temperature
published: 2021-10-15
 
Atomic oxygen data from SCIAMACHY, for the MLT, 2002-2012, averaged for 26, 14 day periods, beginning January 1.
keywords: SCIAMACHY data
published: 2021-10-15
 
Atomic oxygen densities in the MLT, averaged for 2002-2018 for 26, 14 day periods, beginning January 1.
keywords: SABER data
has sharing link
 
published: 2021-10-15
 
Information on the location, dimensions, time of treefall or death, decay state, wood nutrient, wood pH and wood density data, and soil moisture, slope, distance from forest edge and soil nutrient data associated with the publication "Interspecific wood trait variation predicts decreased carbon residence time in changing forests" authored by Sierra Perez, Jennifer Fraterrigo, and James Dalling. ** <b>Note:</b> Blank cells indicate that no data were collected.
keywords: wood decay; carbon residence time; coarse woody debris; decomposition, temperate forests
published: 2021-10-15
 
This is the 5 states 5000 cells synthetic expression file we used for validation of SimiC, a single cell gene regulatory network inference method with similarity constraints. Ground truth GRNs are stored in Numpy array format, and expression profiles of all states combined are stored in Pandas DataFrame in format of Pickle files.
keywords: Numpy array; GRNs; Pandas DataFrame;
published: 2021-10-13
 
Drainage network analysis is fundamental to understanding the characteristics of surface hydrology. Based on elevation data, drainage network analysis is often used to extract key hydrological features like drainage networks and streamlines. Limited by raster-based data models, conventional drainage network algorithms typically allow water to flow in 4 or 8 directions (surrounding grids) from a raster grid. To resolve this limitation, this paper describes a new vector-based method for drainage network analysis that allows water to flow in any direction around each location. The method is enabled by rapid advances in Light Detection and Ranging (LiDAR) remote sensing and high-performance computing. The drainage network analysis is conducted using a high-density point cloud instead of Digital Elevation Models (DEMs) at coarse resolutions. Our computational experiments show that the vector-based method can better capture water flows without limiting the number of directions due to imprecise DEMs. Our case study applies the method to Rowan County watershed, North Carolina in the US. After comparing the drainage networks and streamlines detected with corresponding reference data from US Geological Survey generated from the Geonet software, we find that the new method performs well in capturing the characteristics of water flows on landscape surfaces in order to form an accurate drainage network. This dataset contains all the code, notebooks, datasets used in the study conducted for the research publication titled " A Vector-Based Method for Drainage Network Analysis Based on LiDAR Data ". ## What's Inside A quick explanation of the components * `A Vector Approach to Drainage Network Analysis Based on LiDAR Data.ipynb` is a notebook for finding the drainage network based on LiDAR data *`Picture1.png` is a picture representing the pseudocode of our new algorithm * HPC` folder contains codes for running the algorithm with sbatch in HPC ** `execute.sh` is a bash script file that use sbatch to conduct large scale analysis for the algorithm ** `run.sh` is a bash script file that calls the script file `execute.sh` for large scale calculation for the algorithm ** `run.py` includes the codes implemented for the algorithm * `Rowan Creek Data` includes data that are used in the study ** `3_1.las` and `3_2.las ` are the LiDAR data files that is used in our analysis presented in the paper. Users may use this data file to reproduce our results and may replace it with their own LiDAR file to run this method over different areas ** `reference` folder includes reference data from USGS *** `reference_3_1.tif` and `reference_3_2.tif` are reference data for the drainage system analysis retrieved from USGS.
keywords: CyberGIS; Drainage System Analysis; LiDAR
published: 2021-10-10
 
This data set describes temperature, dissolved oxygen, and secchi depth in 1-m interval profiles in the deepest point in 10 Illinois reservoirs between the years 1995 and 2016.
keywords: Water temperature; dissolved oxygen; secchi depth; climate change
published: 2021-10-11
 
This dataset contains the ClonalKinetic dataset that was used in SimiC and its intermediate results for comparison. The Detail description can be found in the text file 'clonalKinetics_Example_data_description.txt' and 'ClonalKinetics_filtered.DF_data_description.txt'. The required input data for SimiC contains: 1. ClonalKinetics_filtered.clustAssign.txt => cluster assignment for each cell. 2. ClonalKinetics_filtered.DF.pickle => filtered scRNAseq matrix. 3. ClonalKinetics_filtered.TFs.pickle => list of driver genes. The results after running SimiC contains: 1. ClonalKinetics_filtered_L10.01_L20.01_Ws.pickle => inferred GRNs for each cluster 2. ClonalKinetics_filtered_L10.01_L20.01_AUCs.pickle => regulon activity scores for each cell and each driver gene. <b>NOTE:</b> “ClonalKinetics_filtered.rds” file which is mentioned in “ClonalKinetics_filtered.DF_data_description.txt” is an intermediate file and the authors have put all the processed in the pickle/txt file as described in the filtered data text.
keywords: GRNs;SimiC;RDS;ClonalKinetic
published: 2021-10-04
 
This dataset contains all the necessary information to recreate the study presented in the paper entitled "Learning coagulation processes with combinatorially-invariant neural networks". This consists of (1) the aggregated output files used for machine learning, (2) the machine learning codes used to learn the presented models, (3) the PartMC model source code that was used to generate the simulation data and (4) the Python scripts used construct the scenario library for training and testing simulations. This data was used to investigate a method (combinatorally-invariant neural network) for learning the aerosol process of coagulation. This data may be useful for application of other methods.
keywords: Machine learning; Atmospheric chemistry; Particle-resolved modeling; Coagulation; Atmospheric Science
published: 2021-09-17
 
We studied vegetation metric robustness to environmental (season, interannual, and regional) and methodological (observer) variables, as well as adequate sample size for vegetation metrics across four regions of the United States.
keywords: coefficients of conservatism; floristic quality assessment; restoration; vegetation metric;
published: 2021-09-03
 
All of the files in this dataset pertain to the evaluation of a novel statistic, Hind/He, for distinguishing Mendelian loci from paralogs. They are derived from a RAD-seq genotyping dataset of diploid and tetraploid Miscanthus sacchariflorus.
published: 2021-08-24
 
This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology. Each zipped file contains a README.
keywords: deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation
published: 2021-08-20
 
In 2020, early-season extreme precipitation events occurred following the planting of Sorghum bicolor (L.) Moench and Zea mays L. in central Illinois that caused ponding. Following the first rainfall event 50m transects were established to assess the waterlogging effects on seedling emergence and crop yields. Soil moisture, emergence, stem and tiller count, LAI, and yield were measured at various points in the season along these transects.
keywords: Sorghum; Maize; Emergence; Yield; LAI
published: 2021-08-15
 
This data set contains mass spectrometry data used for the publication "mspack: efficient lossless and lossy mass spectrometry data compression".
keywords: mass-spectrometry data; compression; proteomics
published: 2021-08-14
 
1. Rice H2 - Destructive Harvest - These data are for the destructive harvest (above-ground biomass) of 30 diverse indica rice genotypes that were grown to evaluate natural variation as well as the heritability of photosynthesis-related traits. Traits measured include: plant height, leaf area, plant fresh and dry weights, and tiller number. 2. Rice H2 - ACi Response Summary - These data characterize the response of CO2 uptake to change in intercellular CO2 concentration in 30 diverse indica rice genotypes. These measurements were taken to evaluate natural variation and the heritability of photosynthesis-related traits in rice. 3. Rice H2 - Survey Style Gas Exchange Measurements - These data document steady-state survey style gas exchange measurements in 30 diverse indica rice genotypes. These measurements were taken to evaluate natural variation and the heritability of photosynthesis-related traits in rice.
keywords: photosynthesis, photosynthetic capacity, natural variation, heritability, food security, rice
published: 2021-08-12
 
This dataset contains the images of a photoperiod sensitive sorghum accession population used for a GWAS/TWAS study of leaf traits related to water use efficiency in 2016 and 2017. *<b>Note:</b> new in this second version is that JPG images outputted from the nms files were added <b>Accessions_2016.zip</b> and <b>Accessions_2017.zip</b>: contain raw images produced by Optical Topometer (nms files) for all sorghum accessions. Images can be opened with Nanofocus μsurf analysis extended software (Oberhausen,Germany). <b>Accessions_2016_jpg.zip</b> and <b>Accessions_2017_jpg.zip</b>: contain jpg images outputted from the nms files and used in the machine learning phenotyping.
keywords: stomata; segmentation; water use efficiency
published: 2021-08-05
 
This geodatabase serves two purposes: 1) to provide State of Illinois agencies with a fast resource for the preparation of maps and figures that require the use of shape or line files from federal agencies, the State of Illinois, or the City of Chicago, and 2) as a start for social scientists interested in exploring how geographic information systems (whether this is data visualization or geographically weighted regression) can bring new meaning to the interpretation of their data. All layer files included are relevant to the State of Illinois. Sources for this geodatabase include the U.S. Census Bureau, U.S. Geological Survey, City of Chicago, Chicago Public Schools, Chicago Transit Authority, Regional Transportation Authority, and Bureau of Transportation Statistics.
keywords: State of Illinois; City of Chicago; Chicago Public Schools; GIS; Statistical tabulation areas; hydrography
published: 2021-08-04
 
This dataset contains data derived from large-scale particle velocimetry measurements obtained at the confluence of the Saline Branch and an unnamed tributary in Illinois. The data were collected using two cameras positioned about the confluence, one mounted on a cable and the other mounted on a tripod. A description of the content of the files can be found in Description of Files.rtf.
keywords: confluence; hydrodynamics; LSPIV; flow structure; stagnation