Displaying Dataset 1 - 25 of 72 in total

Subject Area

Life Sciences (72)
Social Sciences (0)
Physical Sciences (0)
Technology and Engineering (0)
Uncategorized (0)

Funder

U.S. National Science Foundation (NSF) (24)
Other (19)
U.S. Department of Energy (DOE) (6)
U.S. Department of Agriculture (USDA) (6)
Illinois Department of Natural Resources (IDNR) (3)
U.S. National Institutes of Health (NIH) (3)
U.S. National Aeronautics and Space Administration (NASA) (1)
U.S. Geological Survey (USGS) (1)

Publication Year

2018 (26)
2017 (19)
2019 (14)
2016 (12)
2020 (1)

License

CC0 (50)
CC BY (21)
custom (1)
published: 2019-05-16
 
This repository includes scripts and datasets for the paper, "Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge." All data files in this repository are for analyses using the logdet distance matrix computed on the concatenated alignment. Data files for analyses using the average gene-tree internode distance matrix can be downloaded from the Illinois Data Bank (https://doi.org/10.13012/B2IDB-1424746_V1). The latest version of NJMerge can be downloaded from Github (https://github.com/ekmolloy/njmerge).<br /> <strong>List of Changes:</strong> &bull; Updated timings for NJMerge pipelines to include the time required to estimate distance matrices; this impacted files in the following folder: <strong>data.zip</strong> &bull; Replaced "Robinson-Foulds" distance with "Symmetric Difference"; this impacted files in the following folders: <strong> tools.zip; data.zip; scripts.zip</strong> &bull; Added some additional information about the java command used to run ASTRAL-III; this impacted files in the following folders: <strong>data.zip; astral64-trees.tar.gz (new)</strong>
keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
planned publication date: 2020-04-22
 
Nest survival and Fledgling production data for Bell's Vireo and Willow Flycatcher nests.
keywords: Bell's Vireo;Willow Flycatcher;habitat selection;fitness;
planned publication date: 2019-05-31
 
This dataset includes all data presented in the manuscript entitled: "Dynamic controls on field-scale soil nitrous oxide hot spots and hot moments across a microtopographic gradient"
keywords: denitrification; depressions; microtopography; nitrous oxide; soil oxygen; soil temperature
published: 2018-07-29
 
This repository includes scripts, datasets, and supplementary materials for the study, "NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge. ***When downloading datasets, please note that the following errors.*** In README.txt, lines 37 and 38 should read: + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre Note that the file names (fasttree-exon.tre and fasttree-intron.tre) are swapped. In tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the "symmetric difference error rate" as the "Robinson-Foulds error rate". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative. In njmerge-supplementary-materials.pdf, the alpha parameter shown in Supplementary Table S2 is actually the divisor D, which is used to compute alpha for each gene as follows. 1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution. 2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2). Note that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.
keywords: phylogenomics; species trees; incomplete lineage sorting; divide-and-conquer
published: 2019-03-19
 
This repository includes scripts and datasets for the paper, "TreeMerge: A new method for improving the scalability of species tree estimation methods." The latest version of TreeMerge can be downloaded from Github (https://github.com/ekmolloy/treemerge).
keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published: 2018-03-01
 
The data set consists of Illumina sequences derived from 48 sediment samples, collected in 2015 from Lake Michigan and Lake Superior for the purpose of inventorying the fungal diversity in these two lakes. DNA was extracted from ca. 0.5g of sediment using the MoBio PowerSoil DNA isolation kits following the Earth Microbiome protocol. PCR was completed with the fungal primers ITS1F and fITS7 using the Fluidigm Access Array. The resulting amplicons were sequenced using the Illumina Hi-Seq2500 platform with rapid 2 x 250nt paired-end reads. The enclosed data sets contain the forward read files for both primers, both fixed-header index files, and the associated map files needed to be processed in QIIME. In addition, enclosed are two rarefied OTU files used to evaluate fungal diversity. All decimal latitude and decimal longitude coordinates of our collecting sites are also included. File descriptions: Great_lakes_Map_coordinates.xlsx = coordinates of sample sites QIIME Processing ITS1 region: These are the raw files used to process the ITS1 Illumina reads in QIIME. ***only forward reads were processed GL_ITS1_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS1F_Miller_Fludigm_I1_fixedheader.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS1F_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS1 region. QIIME Processing ITS2 region: These are the raw files used to process the ITS2 Illumina reads in QIIME. ***only forward reads were processed GL_ITS2_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS7_Miller_Fludigm_I1_Fixedheaders.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS7_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS2 region. Resulting OTU Table and OTU table with taxonomy ITS1 Region wahl_ITS1_R1_otu_table.csv = File contains Representative OTUs based on ITS1 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS1_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS1 region for all the R1 and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev ITS2 Region wahl_ITS2_R1_otu_table.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS2_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev Rarified illumina dataset for each ITS Region ITS1_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS1 region. ITS2_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS2 region. Column headings: #SampleID = code including researcher initials and sequential run number BarcodeSequence = LinkerPrimerSequence = two sequences used CTTGGTCATTTAGAGGAAGTAA or GTGARTCATCGAATCTTTG ReversePrimer = two sequences used GCTGCGTTCTTCATCGATGC or TCCTCCGCTTATTGATATGC run_prefix = initials of run operator Sample = location code, see thesis figures 1 and 2 for mapped locations and Great_lakes_Map_coordinates.xlsx for exact coordinates. DepthGroup = S= shallow (50-100 m), MS=mid-shallow (101-150 m), MD=mid-deep (151-200 m), and D=deep (>200 m)" Depth_Meters = Depth in meters Lake = lake name, Michigan or Superior Nitrogen % Carbon % Date = mm/dd/yyyy pH = acidity, potential of Hydrogen (pH) scale SampleDescription = Sample or control X = sequential run number OTU ID = Operational taxonomic unit ID
keywords: Illumina; next-generation sequencing; ITS; fungi
published: 2019-03-25
 
This dataset contains genotypic and phenotypic data, R scripts, and the results of analysis pertaining to a multi-location field trial of Miscanthus sinensis. Genome-wide association and genomic prediction were performed for biomass yield and 14 yield-component traits across six field trial locations in Asia and North America, using 46,177 single-nucleotide polymorphism (SNP) markers mined from restriction site-associated DNA sequencing (RAD-seq) and 568 M. sinensis accessions. Genomic regions and candidate genes were identified that can be used for breeding improved varieties of M. sinensis, which in turn will be used to generate new M. xgiganteus clones for biomass.
keywords: miscanthus; genotyping-by-sequencing (GBS); genome-wide association studies (GWAS); genomic selection
published: 2019-03-22
 
This data publication provides example video clips related to research on association among flight ability of juvenile songbirds at fledging and juvenile morphological traits (wing emergence, wing length, body condition, mass, and tarsus length. File names reflect the species dropped in each video. These videos are supplemental material for scientific publications by the authors and reflect an example subset of all videos collected form 2017-2018 as part of a larger study on the post-fledging ecology of grassland and shrubland birds in east-Central Illinois, USA. No birds were harmed/injured in the production of these videos and procedures were approved by the Illinois Institutional Animal Care and Use Committee (IACUC), protocol no. 18221. Individuals depicted in the videos have given consent for the videos to be shared (talent/model release form; <a href="https://publicaffairs.illinois.edu/resources/release/">https://publicaffairs.illinois.edu/resources/release/</a>)
keywords: songbirds; flight ability; wing development; wing length; wing emergence; nestling development; post-fledging
published: 2019-03-06
 
Chronic contact exposure to realistic soil concentrations (0, 7.5, 15, and 100 ppb) of the neonicotinoid pesticide imidacloprid had species- and sex-specific effects on bee adult longevity, immature development speed, and mass. This dataset contains a life table tracking the development, mass, and deaths of a single cohort of Osmia lignaria and Megachile rotundata over the course of two summers. Other data files include files created for multi-event survival analysis to analyze the effect on development speed. Detected effects included: decreased adult longevity for female O. lignaria at the highest concentration, a trend for a hormetic effect on female M. rotundata development speed and mass (longest development time and greatest mass in the 15 ppb treatment), and decreased adult longevity and increased development speed at high imidacloprid concentrations as well as a hormetic effect on mass (lowest in the 15 ppb treatment treatment) on male M. rotundata.
keywords: neonicotinoid; imidacloprid; bee; habitat restoration;
published: 2019-03-06
 
This dataset is provided to support the statements in Tarokh, A., and R.Y. Makhnenko. 2019. Remarks on the solid and bulk responses of fluid-filled porous rock, Geophysics. The unjacketed bulk modulus is a poroelastic parameter that can be directly measured in a laboratory test under a loading that preserves the difference between the mean stress and pore pressure constant. For a monomineralic rock, the measurement of the unjacketed bulk modulus is ignored because it is assumed to be equal to the bulk modulus of the solid phase. To examine this assumption, we tested porous sandstones (Berea and Dunnville) and limestones (Apulian and Indiana) mainly composed of quartz and calcite, respectively, under the unjacketed condition. The presence of microscale inhomogeneities, in the form of non-connected (occluded) pores, was shown to cause a considerable difference between the unjacketed bulk modulus and the bulk modulus of the solid phase. Furthermore, we found the unjacketed bulk modulus to be independent of the unjacketed pressure and Terzaghi effective pressure and therefore a constant.
keywords: Poroelasticity; anisotropic solid skeleton; unjacketed bulk modulus; non-connected porosity
published: 2019-02-26
 
We have recently created an approach for high throughput single cell measurements using matrix assisted laser desorption / ionization mass spectrometry (MALDI MS) (J Am Soc Mass Spectrom. 2017, 28, 1919-1928. doi: 10.1007/s13361-017-1704-1. Chemphyschem. 2018, 19, 1180-1191. doi: 10.1002/cphc.201701364). While chemical detail is obtained on individual cells, it has not been possible to correlate the chemical information with canonical cell types. Now we combine high-throughput single cell mass spectrometry with immunocytochemistry to determine lipid profiles of two known cell types, astrocytes and neurons from the rodent brain, with the work appearing as “Lipid heterogeneity between astrocytes and neurons revealed with single cell MALDI MS supervised by immunocytochemical classification” (DOI: 10.1002/anie.201812892). Here we provide the data collected for this study. The dataset provides the raw data and script files for the rodent cerebral cells described in the manuscript.
keywords: Single cell analysis; mass spectrometry; astrocyte; neuron; lipid analysis
published: 2019-02-02
 
The bee visitation data includes the percentage of each bee pollinator group in bee bowls and observed. The data are referenced in the article with the following citation: Bennett, A.B., Lovell, S.T. 2019. Landscape and local site variables differentially influence pollinators and pollination services in urban agricultural sites. Accepted for publication in: PLOS ONE.
published: 2019-02-02
 
Landscape attributes of the nineteen sites as supplemental data for the following article: Bennett, A.B., Lovell, S.T. 2019. Landscape and local site variables differentially influence pollinators and pollination services in urban agricultural sites. Accepted for publication in: PLOS ONE.
published: 2019-01-27
 
This repository include datasets that are studied with INC/INC-ML/INC-NJ in the paper `Using INC within Divide-and-Conquer Phylogeny Estimation' that was submitted to AICoB 2019. Each dataset has its own readme.txt that further describes the creation process and other parameters/softwares used in making these datasets. The latest implementation of INC/INC-ML/INC-NJ can be found on https://github.com/steven-le-thien/constraint_inc. Note: there may be files with DS_STORE as extension in the datasets; please ignore these files.
keywords: phylogenetics; gene tree estimation; divide-and-conquer; absolute fast converging
published: 2019-02-07
 
This dataset contains all data used in the two studies included in "PICAN-PI..." by Nute, et al, other than the original raw sequences. That includes: 1) Supplementary information for the Manuscript, including all the graphics that were created, 2) 16S Reference Alignment, Phylogeny and Taxonomic Annotation used by SEPP, and 3) Data used in the manuscript as input for the graphics generation (namely, SEPP outputs and sequence multiplicities).
keywords: microbiome; data visualization; graphics; phylogenetics; 16S
published: 2018-08-16
 
This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Effects of an invasive perennial forb on gross soil nitrogen cycling and nitrous oxide fluxes," submitted to Ecology for peer-reviewed publication. Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords: pepperweed; nitrogen cycling; nitrous oxide; invasive species; Bay Delta
published: 2018-12-04
 
The text file contains the original data used in the phylogenetic analyses of Wang et al. (2017: Scientific Reports 7:45387). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 81 taxa (species) and 2905 characters, indicate that the first 2805 characters are DNA sequence and the last 100 are morphological, that the data may be interleaved (with data for one species on multiple rows), that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 100 morphological characters. The identity and positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; wingless; histone H3; cytochrome oxidase I; bayesian analysis
published: 2018-12-06
 
The text file contains the original DNA sequence data used in the phylogenetic analyses of Krishnankutty et al. (2016: Systematic Entomology 41: 580–595). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The file contains five separate data blocks, one for each character partition (28S, histone H3, 12S, indels, and morphology) for 53 taxa (species). Gaps inserted into the DNA sequence alignment are indicated by a dash, and missing data are indicated by a question mark. The separate "indels1" block includes 40 indels (insertions/deletions) from the 28S sequence alignment re-coded using the modified complex indel coding scheme, as described in the "Materials and methods" of the original publication. The DIMENSIONS statements near the beginning of each block indicate the numbers of taxa (NTax) and characters (NChar). The file contains aligned nucleotide sequence data for 3 gene regions and 40 morphological characters. The file is configured for use with the maximum likelihood-based phylogenetic program GARLI but can also be parsed by any other bioinformatics software that supports the NEXUS format. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supporting pdf file. More details on individual analyses are provided in the original publication.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; 12S mtDNA; maximum likelihood
published: 2018-12-31
 
Sixty undergraduate STEM lecture classes were observed across 14 departments at the University of Illinois Urbana-Champaign in 2015 and 2016. We selected the classes to observe using purposive sampling techniques with the objectives of (1) collecting classroom observations that were representative of the STEM courses offered; (2) conducting observations on non-test, typical class days; and (3) comparing these classroom observations using the Class Observation Protocol for Undergraduate STEM (COPUS) to record the presence and frequency of active learning practices utilized by Community of Practice (CoP) and non-CoP instructors. Decimal values are the result of combined observations. All COPUS codes listed are from Smith (2013) "The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize STEM Classroom Practices" paper. For more information on the data collection process, see "Evidence that communities of practice are associated with active learning in large STEM lectures" by Tomkin et. al. (2019) in the International Journal of STEM Education.
keywords: COPUS, Community of Practice
published: 2018-10-17
 
This is the dataset used in the Ecological Applications publication of the same name. This dataset consists of the following files: Internal.Community.Data.txt Regional.Community.Data.txt Site.Attributes.txt Year.Of.Final.Bio.Monitoring.txt Internal.Community.Data.txt is a site and plot by species matrix. Column labeled SITE consists of site IDs. Column labeled Plot consists of Plot numbers. All other columns represent species relative abundances per plot. Regional.Community.Data.txt is a site by species matrix of relative abundances. Column labeled site consists of site IDs. All other columns represent species relative abundances per site. Site.attributes.txt is a matrix of site attributes. Column labeled SITE consists of site IDs. Column labeled Long represents longitude in decimal degrees. Column labeled Lat represents latitude in decimal degrees. Column labeled Richness represents species richness of sites calculated from Regional Community Data. Column labeled NAT_COMP_REST represents designation as a randomly selected natural wetland (NAT), compensation wetland (COMP) or reference quality natural wetland (REF). Column labeled HQ_LQ_COMP represents designation as high quality (HQ), low quality (LQ) or compensation wetland (COMP). Column labeled SAMPLING_YEAR_INTERNAL represents year data used for analysis of internal β-diversity was gathered. Column labeled SAMPLING_YEAR_REGIONAL represents year data used for analysis of regional β-diversity was gathered. Column labeled TRANSECT_LENGTH represents length in meters of initial sampling transect. INAI_GRADE represents Illinois Natural Areas Inventory grades assigned to each site. Grades range from A for highest quality natural areas to E for lowest quality natural areas. Year.Of.Final.Bio.Monitoring.txt is a table representing years of final monitoring of compensation wetlands as mandated by the US Army Corps of Engineers. Column labeled Site consists of site IDs. Column labeled YR_FIN_BIO_MON consists of years of final monitoring. Entries of N/A represent dates that were unable to be located. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for data on naturally occurring wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/
keywords: biodiversity; wetlands; wetland mitigation; biotic homogenization; beta diversity
published: 2018-11-21
 
This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.
keywords: R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation
published: 2018-10-24
 
This dataset was compiled between 2010 and 2011 from data published in the scientific literature from articles evaluating the influence of cropping systems and soil management practices on soil organic Carbon. We used the Thomas Reuter Web of Science database and by reviewed the reference sections of key peer-reviewed articles. Articles included in the database presented results from field sites within the continental United States.
keywords: Cropping systems; soil management; soil organic carbon; soil quality.
published: 2016-08-16
 
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published: 2018-12-01
 
Ammonia flux measurement data using flux gradient and relaxed eddy accumulation methods, and ancillary environmental data collected during the 2014 corn-growing season in Central Illinois, USA. This excel file contains two spreadsheets: one README sheet, and one sheet containing all data. These data were used in the development of the manuscript titled "Ammonia Flux Measurements above a Corn Canopy using Relaxed Eddy Accumulation and a Flux Gradient System."
keywords: Ammonia; Bi-directional Flux; Corn; Relaxed Eddy Accumulation; Flux Gradient; Urease Inhibitor