Displaying Dataset 1 - 25 of 139 in total

Subject Area

Life Sciences (64)
Social Sciences (37)
Technology and Engineering (19)
Physical Sciences (18)
Uncategorized (1)

Funder

U.S. National Science Foundation (NSF) (39)
Other (29)
U.S. National Institutes of Health (NIH) (18)
U.S. Department of Energy (DOE) (14)
U.S. Department of Agriculture (USDA) (6)
Illinois Department of Natural Resources (IDNR) (2)
U.S. National Aeronautics and Space Administration (NASA) (2)
U.S. Geological Survey (USGS) (2)

Publication Year

2018 (64)
2017 (35)
2016 (30)
2019 (10)

License

CC0 (80)
CC BY (57)
custom (2)
published: 2019-02-19
 
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
keywords: databases; research infrastructure; sustainability; data sharing; molecular biology; bioinformatics; bibliometrics
planned publication date: 2019-03-10
 
Chronic contact exposure to realistic soil concentrations (0, 7.5, 15, and 100 ppb) of the neonicotinoid pesticide imidacloprid had species- and sex-specific effects on bee adult longevity, immature development speed, and mass. This dataset contains a life table tracking the development, mass, and deaths of a single cohort of Osmia lignaria and Megachile rotundata over the course of two summers. Other data files include files created for multi-event survival analysis to analyze the effect on development speed. Detected effects included: decreased adult longevity for female O. lignaria at the highest concentration, a trend for a hormetic effect on female M. rotundata development speed and mass (longest development time and greatest mass in the 15 ppb treatment), and decreased adult longevity and increased development speed at high imidacloprid concentrations as well as a hormetic effect on mass (lowest in the 15 ppb treatment treatment) on male M. rotundata.
keywords: neonicotinoid; imidacloprid; bee; habitat restoration;
published: 2019-02-02
 
The bee visitation data includes the percentage of each bee pollinator group in bee bowls and observed. The data are referenced in the article with the following citation: Bennett, A.B., Lovell, S.T. 2019. Landscape and local site variables differentially influence pollinators and pollination services in urban agricultural sites. Accepted for publication in: PLOS ONE.
published: 2019-02-02
 
Landscape attributes of the nineteen sites as supplemental data for the following article: Bennett, A.B., Lovell, S.T. 2019. Landscape and local site variables differentially influence pollinators and pollination services in urban agricultural sites. Accepted for publication in: PLOS ONE.
published: 2019-01-07
 
Vendor transcription of the Catalogue of Copyright Entries, Part 1, Group 1, Books: New Series, Volume 29 for the Year 1932. This file contains all of the entries from the indicated volume.
keywords: copyright; Catalogue of Copyright Entries; Copyright Office
published: 2019-01-27
 
This repository include datasets that are studied with INC/INC-ML/INC-NJ in the paper `Using INC within Divide-and-Conquer Phylogeny Estimation' that was submitted to AICoB 2019. Each dataset has its own readme.txt that further describes the creation process and other parameters/softwares used in making these datasets. The latest implementation of INC/INC-ML/INC-NJ can be found on https://github.com/steven-le-thien/constraint_inc. Note: there may be files with DS_STORE as extension in the datasets; please ignore these files.
keywords: phylogenetics; gene tree estimation; divide-and-conquer; absolute fast converging
published: 2019-02-07
 
This dataset contains all data used in the two studies included in "PICAN-PI..." by Nute, et al, other than the original raw sequences. That includes: 1) Supplementary information for the Manuscript, including all the graphics that were created, 2) 16S Reference Alignment, Phylogeny and Taxonomic Annotation used by SEPP, and 3) Data used in the manuscript as input for the graphics generation (namely, SEPP outputs and sequence multiplicities).
keywords: microbiome; data visualization; graphics; phylogenetics; 16S
published: 2018-08-16
 
This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Effects of an invasive perennial forb on gross soil nitrogen cycling and nitrous oxide fluxes," submitted to Ecology for peer-reviewed publication. Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords: pepperweed; nitrogen cycling; nitrous oxide; invasive species; Bay Delta
published: 2018-12-20
 
File Name: AllWords.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-12 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of all words (all features) from the bag-of-words feature extraction. Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria; Randomized controlled trials; Machine learning; Systematic reviews
published: 2018-12-20
 
File Name: Error_Analysis.xslx Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-12 Data Contributions: Xiaoru Dong, Linh Hoang, Jingyi Xie, Jodi Schneider Data Source: The classification prediction results of prediction in testing data set Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews Description: The file contains lists of the wrong and correct prediction of inclusion criteria of Cochrane Systematic Reviews from the testing data set and the length (number of words) of the inclusion criteria. Notes: In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published: 2018-12-13
 
A 3D CNN method to land cover classification using LiDAR and multitemporal imagery
keywords: 3DCNN; land cover classification; LiDAR; multitemporal imagery
published: 2018-11-19
 
This repository includes scripts and datasets for the paper, "Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge." All data files in this repository are for analyses using the logdet distance matrix computed on the concatenated alignment. Data files for analyses using the average gene-tree internode distance matrix can be downloaded from the Illinois Data Bank (https://doi.org/10.13012/B2IDB-1424746_V1). The latest version of NJMerge can be downloaded from Github (https://github.com/ekmolloy/njmerge).
keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published: 2018-12-04
 
The text file contains the original data used in the phylogenetic analyses of Wang et al. (2017: Scientific Reports 7:45387). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 81 taxa (species) and 2905 characters, indicate that the first 2805 characters are DNA sequence and the last 100 are morphological, that the data may be interleaved (with data for one species on multiple rows), that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 100 morphological characters. The identity and positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; wingless; histone H3; cytochrome oxidase I; bayesian analysis
published: 2018-12-14
 
Spreadsheet with data about whether or not the indicated institutional repository website provides metadata documentation. See readme file for more information.
keywords: institutional repositories; metadata; best practices; metadata documentation
published: 2018-12-06
 
The text file contains the original DNA sequence data used in the phylogenetic analyses of Krishnankutty et al. (2016: Systematic Entomology 41: 580–595). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The file contains five separate data blocks, one for each character partition (28S, histone H3, 12S, indels, and morphology) for 53 taxa (species). Gaps inserted into the DNA sequence alignment are indicated by a dash, and missing data are indicated by a question mark. The separate "indels1" block includes 40 indels (insertions/deletions) from the 28S sequence alignment re-coded using the modified complex indel coding scheme, as described in the "Materials and methods" of the original publication. The DIMENSIONS statements near the beginning of each block indicate the numbers of taxa (NTax) and characters (NChar). The file contains aligned nucleotide sequence data for 3 gene regions and 40 morphological characters. The file is configured for use with the maximum likelihood-based phylogenetic program GARLI but can also be parsed by any other bioinformatics software that supports the NEXUS format. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supporting pdf file. More details on individual analyses are provided in the original publication.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; 12S mtDNA; maximum likelihood
published: 2018-12-20
 
This dataset contains data used to generate figures and tables in the corresponding paper.
keywords: Black carbon; Emission Inventory; Observations; Climate change, Diesel engine, Coal burning
published: 2018-12-20
 
File Name: WordsSelectedByManualAnalysis.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: this file contains the list of 407 informative words reselected from the 1655 words by manual analysis. In particular, from the 1655 words that we got from information gain feature selection, we then manually read and eliminated the domain specific words. The remaining words then were selected into the "Manual Analysis Words" as the results. Notes: Even though the list of words in this file was selected manually. However, in order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria; Randomized controlled trials; Machine learning; Systematic reviews
published: 2018-12-20
 
File Name: WordsSelectedByInformationGain.csv Data Preparation: Xiaoru Dong, Linh Hoang Date of Preparation: 2018-12-12 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: the file contains a list of 1655 informative words selected by applying information gain feature selection strategy. Information gain is one of the methods commonly used for feature selection, which tells us how many bits of information the presence of the word are helpful for us to predict the classes, and can be computed in a specific formula [Jurafsky D, Martin JH. Speech and language processing. London: Pearson; 2014 Dec 30].We ran Information Gain feature selection on Weka -- a machine learning tool. Notes: In order to reproduce the data in this file, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria; Randomized controlled trials; Machine learning; Systematic reviews
published: 2018-12-20
 
File Name: Inclusion_Criteria_Annotation.csv Data Preparation: Xiaoru Dong Date of Preparation: 2018-12-14 Data Contributions: Jingyi Xie, Xiaoru Dong, Linh Hoang Data Source: Cochrane systematic reviews published up to January 3, 2018 by 52 different Cochrane groups in 8 Cochrane group networks. Associated Manuscript authors: Xiaoru Dong, Jingyi Xie, Linh Hoang, and Jodi Schneider. Associated Manuscript, Working title: Machine classification of inclusion criteria from Cochrane systematic reviews. Description: The file contains lists of inclusion criteria of Cochrane Systematic Reviews and the manual annotation results. 5420 inclusion criteria were annotated, out of 7158 inclusion criteria available. Annotations are either "Only RCTs" or "Others". There are 2 columns in the file: - "Inclusion Criteria": Content of inclusion criteria of Cochrane Systematic Reviews. - "Only RCTs": Manual Annotation results. In which, "x" means the inclusion criteria is classified as "Only RCTs". Blank means that the inclusion criteria is classified as "Others". Notes: 1. "RCT" stands for Randomized Controlled Trial, which, in definition, is "a work that reports on a clinical trial that involves at least one test treatment and one control treatment, concurrent enrollment and follow-up of the test- and control-treated groups, and in which the treatments to be administered are selected by a random process, such as the use of a random-numbers table." [Randomized Controlled Trial publication type definition from https://www.nlm.nih.gov/mesh/pubtypes.html]. 2. In order to reproduce the relevant data to this, please get the code of the project published on GitHub at: https://github.com/XiaoruDong/InclusionCriteria and run the code following the instruction provided.
keywords: Inclusion criteria, Randomized controlled trials, Machine learning, Systematic reviews
published: 2018-12-31
 
Sixty undergraduate STEM lecture classes were observed across 14 departments at the University of Illinois Urbana-Champaign in 2015 and 2016. We selected the classes to observe using purposive sampling techniques with the objectives of (1) collecting classroom observations that were representative of the STEM courses offered; (2) conducting observations on non-test, typical class days; and (3) comparing these classroom observations using the Class Observation Protocol for Undergraduate STEM (COPUS) to record the presence and frequency of active learning practices utilized by Community of Practice (CoP) and non-CoP instructors. Decimal values are the result of combined observations. All COPUS codes listed are from Smith (2013) "The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize STEM Classroom Practices" paper. For more information on the data collection process, see "Evidence that communities of practice are associated with active learning in large STEM lectures" by Tomkin et. al. (2019) in the International Journal of STEM Education.
keywords: COPUS, Community of Practice
published: 2018-12-19
 
This dataset contains genotypic and phenotypic data, R scripts, and the results of analysis pertaining to a multi-location field trial of Miscanthus sinensis. Genome-wide association and genomic prediction were performed for biomass yield and 14 yield-component traits across six field trial locations in Asia and North America, using 46,177 single-nucleotide polymorphism (SNP) markers mined from restriction site-associated DNA sequencing (RAD-seq) and 568 M. sinensis accessions. Genomic regions and candidate genes were identified that can be used for breeding improved varieties of M. sinensis, which in turn will be used to generate new M. xgiganteus clones for biomass.
keywords: miscanthus; genotyping-by-sequencing (GBS); genome-wide association studies (GWAS); genomic selection
published: 2018-10-17
 
This is the dataset used in the Ecological Applications publication of the same name. This dataset consists of the following files: Internal.Community.Data.txt Regional.Community.Data.txt Site.Attributes.txt Year.Of.Final.Bio.Monitoring.txt Internal.Community.Data.txt is a site and plot by species matrix. Column labeled SITE consists of site IDs. Column labeled Plot consists of Plot numbers. All other columns represent species relative abundances per plot. Regional.Community.Data.txt is a site by species matrix of relative abundances. Column labeled site consists of site IDs. All other columns represent species relative abundances per site. Site.attributes.txt is a matrix of site attributes. Column labeled SITE consists of site IDs. Column labeled Long represents longitude in decimal degrees. Column labeled Lat represents latitude in decimal degrees. Column labeled Richness represents species richness of sites calculated from Regional Community Data. Column labeled NAT_COMP_REST represents designation as a randomly selected natural wetland (NAT), compensation wetland (COMP) or reference quality natural wetland (REF). Column labeled HQ_LQ_COMP represents designation as high quality (HQ), low quality (LQ) or compensation wetland (COMP). Column labeled SAMPLING_YEAR_INTERNAL represents year data used for analysis of internal β-diversity was gathered. Column labeled SAMPLING_YEAR_REGIONAL represents year data used for analysis of regional β-diversity was gathered. Column labeled TRANSECT_LENGTH represents length in meters of initial sampling transect. INAI_GRADE represents Illinois Natural Areas Inventory grades assigned to each site. Grades range from A for highest quality natural areas to E for lowest quality natural areas. Year.Of.Final.Bio.Monitoring.txt is a table representing years of final monitoring of compensation wetlands as mandated by the US Army Corps of Engineers. Column labeled Site consists of site IDs. Column labeled YR_FIN_BIO_MON consists of years of final monitoring. Entries of N/A represent dates that were unable to be located. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for data on naturally occurring wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/
keywords: biodiversity; wetlands; wetland mitigation; biotic homogenization; beta diversity
published: 2018-11-20
 
A dataset of acoustic impulse responses for microphones worn on the body. Microphones were placed at 80 positions on the body of a human subject and a plastic mannequin. The impulse responses can be used to study the acoustic effects of the body and can be convolved with sound sources to simulate wearable audio devices and microphone arrays. The dataset also includes measurements with different articles of clothing covering some of the microphones and with microphones placed on different hats and accessories. The measurements were performed from 24 angles of arrival in an acoustically treated laboratory. All impulse responses are sampled at 48 kHz and truncated to 500 ms. The impulse response data is provided in WAVE audio and MATLAB data file formats. The microphone locations are provided in tab-separated-value files for each experiment and are also depicted graphically in the documentation. The file wearable_mic_dataset_full.zip contains both WAVE- and MATLAB-format impulse responses. The file wearable_mic_dataset_matlab.zip contains only MATLAB-format impulse responses. The file wearable_mic_dataset_wave.zip contains only WAVE-format impulse responses.
keywords: Acoustic impulse responses; microphone arrays; wearables; hearing aids; audio source separation
published: 2018-11-21
 
This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.
keywords: R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation
published: 2018-10-24
 
This dataset was compiled between 2010 and 2011 from data published in the scientific literature from articles evaluating the influence of cropping systems and soil management practices on soil organic Carbon. We used the Thomas Reuter Web of Science database and by reviewed the reference sections of key peer-reviewed articles. Articles included in the database presented results from field sites within the continental United States.
keywords: Cropping systems; soil management; soil organic carbon; soil quality.