CC0 (180)
CC BY (87)
custom (9)
published: 2019-01-27
This repository include datasets that are studied with INC/INC-ML/INC-NJ in the paper `Using INC within Divide-and-Conquer Phylogeny Estimation' that was submitted to AICoB 2019. Each dataset has its own readme.txt that further describes the creation process and other parameters/softwares used in making these datasets. The latest implementation of INC/INC-ML/INC-NJ can be found on https://github.com/steven-le-thien/constraint_inc. Note: there may be files with DS_STORE as extension in the datasets; please ignore these files.
keywords: phylogenetics; gene tree estimation; divide-and-conquer; absolute fast converging
published: 2019-02-07
This dataset contains all data used in the two studies included in "PICAN-PI..." by Nute, et al, other than the original raw sequences. That includes: 1) Supplementary information for the Manuscript, including all the graphics that were created, 2) 16S Reference Alignment, Phylogeny and Taxonomic Annotation used by SEPP, and 3) Data used in the manuscript as input for the graphics generation (namely, SEPP outputs and sequence multiplicities).
keywords: microbiome; data visualization; graphics; phylogenetics; 16S
published: 2018-08-16
This dataset includes data on soil properties, soil N pools, and soil N fluxes presented in the manuscript, "Effects of an invasive perennial forb on gross soil nitrogen cycling and nitrous oxide fluxes," submitted to Ecology for peer-reviewed publication. Please refer to that publication for details about methodologies used to generate these data and for the experimental design.
keywords: pepperweed; nitrogen cycling; nitrous oxide; invasive species; Bay Delta
published: 2018-12-04
The text file contains the original data used in the phylogenetic analyses of Wang et al. (2017: Scientific Reports 7:45387). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The first six lines of the file identify the file as NEXUS, indicate that the file contains data for 81 taxa (species) and 2905 characters, indicate that the first 2805 characters are DNA sequence and the last 100 are morphological, that the data may be interleaved (with data for one species on multiple rows), that gaps inserted into the DNA sequence alignment are indicated by a dash, and that missing data are indicated by a question mark. The file contains aligned nucleotide sequence data for 5 gene regions and 100 morphological characters. The identity and positions of data partitions are indicated in the mrbayes block of commands for the phylogenetic program MrBayes at the end of the file. The mrbayes block also contains instructions for MrBayes on various non-default settings for that program. These are explained in the original publication. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supplementary file.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; wingless; histone H3; cytochrome oxidase I; bayesian analysis
published: 2018-12-06
The text file contains the original DNA sequence data used in the phylogenetic analyses of Krishnankutty et al. (2016: Systematic Entomology 41: 580–595). The text file is marked up according to the standard NEXUS format commonly used by various phylogenetic analysis software packages. The file will be parsed automatically by a variety of programs that recognize NEXUS as a standard bioinformatics file format. The file contains five separate data blocks, one for each character partition (28S, histone H3, 12S, indels, and morphology) for 53 taxa (species). Gaps inserted into the DNA sequence alignment are indicated by a dash, and missing data are indicated by a question mark. The separate "indels1" block includes 40 indels (insertions/deletions) from the 28S sequence alignment re-coded using the modified complex indel coding scheme, as described in the "Materials and methods" of the original publication. The DIMENSIONS statements near the beginning of each block indicate the numbers of taxa (NTax) and characters (NChar). The file contains aligned nucleotide sequence data for 3 gene regions and 40 morphological characters. The file is configured for use with the maximum likelihood-based phylogenetic program GARLI but can also be parsed by any other bioinformatics software that supports the NEXUS format. Descriptions of the morphological characters and more details on the species and specimens included in the dataset are provided in the supplementary document included as a separate pdf. The original raw DNA sequence data are available from NCBI GenBank under the accession numbers indicated in the supporting pdf file. More details on individual analyses are provided in the original publication.
keywords: phylogeny; DNA sequence; morphology; Insecta; Hemiptera; Cicadellidae; leafhopper; evolution; 28S rDNA; histone H3; 12S mtDNA; maximum likelihood
published: 2018-10-17
This is the dataset used in the Ecological Applications publication of the same name. This dataset consists of the following files: Internal.Community.Data.txt Regional.Community.Data.txt Site.Attributes.txt Year.Of.Final.Bio.Monitoring.txt Internal.Community.Data.txt is a site and plot by species matrix. Column labeled SITE consists of site IDs. Column labeled Plot consists of Plot numbers. All other columns represent species relative abundances per plot. Regional.Community.Data.txt is a site by species matrix of relative abundances. Column labeled site consists of site IDs. All other columns represent species relative abundances per site. Site.attributes.txt is a matrix of site attributes. Column labeled SITE consists of site IDs. Column labeled Long represents longitude in decimal degrees. Column labeled Lat represents latitude in decimal degrees. Column labeled Richness represents species richness of sites calculated from Regional Community Data. Column labeled NAT_COMP_REST represents designation as a randomly selected natural wetland (NAT), compensation wetland (COMP) or reference quality natural wetland (REF). Column labeled HQ_LQ_COMP represents designation as high quality (HQ), low quality (LQ) or compensation wetland (COMP). Column labeled SAMPLING_YEAR_INTERNAL represents year data used for analysis of internal β-diversity was gathered. Column labeled SAMPLING_YEAR_REGIONAL represents year data used for analysis of regional β-diversity was gathered. Column labeled TRANSECT_LENGTH represents length in meters of initial sampling transect. INAI_GRADE represents Illinois Natural Areas Inventory grades assigned to each site. Grades range from A for highest quality natural areas to E for lowest quality natural areas. Year.Of.Final.Bio.Monitoring.txt is a table representing years of final monitoring of compensation wetlands as mandated by the US Army Corps of Engineers. Column labeled Site consists of site IDs. Column labeled YR_FIN_BIO_MON consists of years of final monitoring. Entries of N/A represent dates that were unable to be located. More information about this dataset: Interested parties can request data from the Critical Trends Assessment Program, which was the source for data on naturally occurring wetlands in this study. More information on the program and data requests can be obtained by visiting the program webpage. Critical Trends Assessment Program, Illinois Natural History Survey. http://wwx.inhs.illinois.edu/research/ctap/
keywords: biodiversity; wetlands; wetland mitigation; biotic homogenization; beta diversity
published: 2018-11-21
This set of scripts accompanies the manuscript describing the R package polyRAD, which uses DNA sequence read depth to estimate allele dosage in diploids and polyploids. Using several high-confidence SNP datasets from various species, allelic read depth from a typical RAD-seq dataset was simulated, then genotypes were estimated with polyRAD and other software and compared to the true genotypes, yielding error estimates.
keywords: R programming language; genotyping-by-sequencing (GBS); restriction site-associated DNA sequencing (RAD-seq); polyploidy; single nucleotide polymorphism (SNP); Bayesian genotype calling; simulation
published: 2018-10-24
This dataset was compiled between 2010 and 2011 from data published in the scientific literature from articles evaluating the influence of cropping systems and soil management practices on soil organic Carbon. We used the Thomas Reuter Web of Science database and by reviewed the reference sections of key peer-reviewed articles. Articles included in the database presented results from field sites within the continental United States.
keywords: Cropping systems; soil management; soil organic carbon; soil quality.
published: 2016-08-16
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published: 2018-12-01
Ammonia flux measurement data using flux gradient and relaxed eddy accumulation methods, and ancillary environmental data collected during the 2014 corn-growing season in Central Illinois, USA. This excel file contains two spreadsheets: one README sheet, and one sheet containing all data. These data were used in the development of the manuscript titled "Ammonia Flux Measurements above a Corn Canopy using Relaxed Eddy Accumulation and a Flux Gradient System."
keywords: Ammonia; Bi-directional Flux; Corn; Relaxed Eddy Accumulation; Flux Gradient; Urease Inhibitor
published: 2018-10-05
Supplementary Material for article entitled: "Identifying marginal land for multifunctional perennial cropping systems in the Upper Sangamon River Watershed, Illinois". The material includes the methodology of GIS RUSLE model and details of the suitability analysis variables.
keywords: RUSLE model; land use; agricululture
published: 2018-09-26
Nucleotide sequences from wild parsnip CYP71AJ4 (angelic in synthase. <a href ="https://www.ncbi.nlm.nih.gov/nuccore/EF191021">Genbank EF191021</a>) were obtained by Sanger sequencing. Seeds from individual plants from different populations were harvested to obtain corresponding cDNA. The cDNA was cloned and directly sequenced. Aminoacid translations were obtained using standard codon usage. Alignments of CYP71AJ4 sequences (involved in angular furanocoumarin biosynthesis) with as the reference sequence. Consistent amino acid variabilities were found between some populations. The relationship between sequencing variability and selective pressure is not yet known.
keywords: Pastinaca sativa; parsnip; furanocoumarins; psoralen
published: 2018-08-02
Weather data used in the survival (mark-recapture) analysis of Swainson's Thrushes crossing the Gulf of Mexico
keywords: weather; Gulf of Mexico; Thrushes
published: 2018-08-02
Data used to estimate the survival of Swainson's Thrushes crossing the Gulf of Mexico.
keywords: capture history; thrush; survival
published: 2018-08-03
These data include information on a field experiment on Castilleja coccinea (L.) Spreng., scarlet Indian paintbrush (Orobanchaceae). There is intraspecific variation in scarlet Indian paintbrush in the color of the bracts surrounding the flowers. Two bract color morphs were included in this study, the scarlet and yellow morphs. The experiment was conducted at Illinois Beach State Park in 2012. The aim of the work was to compare the color morphs with regard to 1) self-compatibility, 2) response to pollinator exclusion, 3) cross-compatibility between the color morphs, and 4) relative female fertility and male fitness. Three files are attached with this record. The raw data are in "fruitSet.csv" and "seedSet.csv", while "readme.txt" has detailed explanations of the raw data files.
keywords: Castilleja coccinea; Orobanchaceae; floral color polymorphism; bract color polymorphism; breeding system; hand-pollination; self-compatibility; reproductive assurance
published: 2018-06-18
This repository contains datasets and R scripts that were used in a study of the population structure of Miscanthus sacchariflorus in its native range across East Asia. Notably, genotypes of 764 individuals at 34,605 SNPs, called from reduced-representation DNA sequencing using a non-reference bioinformatics pipeline, are provided. Two similar SNP datasets, used for identifying clonal duplicates and for determining the ancestry of ornamental and hybrid Miscanthus plants identified in previous studies respectively, are also provided. There is also a spreadsheet listing the provenance and ploidy of all individuals along with their plastid (chloroplast) haplotypes. Software output for Structure, Treemix, and DIYABC is also included. See README.txt for more information about individual files. Results of this study are described in a manuscript in revision in Annals of Botany by the same authors, "Population structure of Miscanthus sacchariflorus reveals two major polyploidization events, tetraploid-mediated unidirectional introgression from diploid Miscanthus sinensis, and diversity centered around the Yellow Sea."
keywords: Miscanthus; restriction site-associated DNA sequencing (RAD-seq); single nucleotide polymorphism (SNP); population genetics; Miscanthus xgiganteus; Miscanthus sacchariflorus; R scripts; germplasm; plastid haplotype
published: 2018-04-26
GBS data from soybean lines carrying introgressions from Glycine tomentella. This project is led by Dr. Randy Nelson, USDA scientist at the University of Illinois. Fastq files contain raw Illumina data. Txt files are keyfiles containing barcodes for each genetic entity.
published: 2018-04-05
GBS data from Phaseolus accessions, for a study led by Dr. Glen Hartman, UIUC. <br />The (zipped) fastq file can be processed with the TASSEL GBS pipeline or other pipelines for SNP calling. The related article has been submitted and the methods section describes the data processing in detail.
published: 2018-05-01
GBS data for G. max x G. soja crosses, a project led by Dr. Randy Nelson.
published: 2018-05-06
This deposit contains all raw data and analysis from the paper "In-cell titration of small solutes controls protein stability and aggregation". Data is collected into several types: 1) analysis*.tar.gz are the analysis scripts and the resulting data for each cell. The numbers correspond to the numbers shown in Fig.S1. (in publication) 2) scripts.tar.gz contains helper scripts to create the dataset in bash format. 3) input.tar.gz contains headers and other information that is fed into bash scripts to create the dataset. 4) All rawData*.tar.gz are tarballs of the data of cells in different solutes in .mat files readable by matlab, as follows: - Each experiment included in the publication is represented by two matlab files: (1) a calibration jump under amber illumination (_calib.mat suffix) (2) a full jump under blue illumination (FRET data) - Each file contains the following fields: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;coordleft - coordinates of cropped and aligned acceptor channel on the original image &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;coordright - coordinates of cropped and aligned donor channel on the original image] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dataleft - a 3d 12-bit integer matrix containing acceptor channel flourescence for each pixel and time step. Not available in _calib files &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dataright - a 3d 12-bit integer matrix containing donor channel flourescence for each pixel and time step. This will be mCherry in _calib files and AcGFP in data files. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;frame1 - original image size &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;imgstd - cropped dimensions &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numFrames - number of frames in dataleft and dataright &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;videos - a structure file containing camera data. Specifically, videos.TimeStamp includes the time from each frame.
keywords: Live cell; FRET microscopy; osmotic challenge; intracellular titrations; protein dynamics
published: 2017-12-04
Data used for Zaya et al. (2018), published in Invasive Plant Science and Management DOI 10.1017/inp.2017.37, are made available here. There are three spreadsheet files (CSV) available, as well as a text file that has detailed descriptions for each file ("readme.txt"). One spreadsheet file ("prices.csv") gives pricing information, associated with Figure 3 in Zaya et al. (2018). The other two spreadsheet files are associated with the genetic analysis, where one file contains raw data for biallelic microsatellite loci ("genotypes.csv") and the other ("structureResults.csv") contains the results of Bayesian clustering analysis with the program STRUCTURE. The genetic data may be especially useful for future researchers. The genetic data contain the genotypes of the horticultural samples that were the focus of the published article, and also genotypes of nearly 400 wild plants. More information on the location of the wild plant collections can be found in the Supplemental information for Zaya et al. (2015) Biological Invasions 17:2975–2988 DOI 10.1007/s10530-015-0926-z. See "readme.txt" for more information.
keywords: Horticultural industry; invasive species; microsatellite DNA; mislabeling; molecular testing
published: 2017-12-15
These are the results of an 8 month cohort study in two commercial dairy herds in Northwest Illinois. From each herd, 50 cows were selected at random, stratified over lactations 1 to 3. Serum from these animals was collected every two months and tested for antibodies to Bovine Leukosis Virus, Neospora caninum, and Mycobacterium avium subsp. paratuberculosis. Animals that left the herd during the study were replaced by another animal in the same herd and lactation. At the last sampling, serum neutralization assays were performed for Bovine Herpesvirus type 1 and Bovine Viral Diarrhea virus type 1 and 2. Production data before and after sampling was collected for the entire herd from PCdart.
keywords: serostatus;dairy;production;cohort
published: 2018-01-03
Concatenated sequence alignment, phylogenetic analysis files, and relevant software parameter files from a cophylogenetic study of Brueelia-complex lice and their avian hosts. The sequence alignment file includes a list of character blocks for each gene alignment and the parameters used for the MrBayes phylogenetic analysis. 1) Files from the MrBayes analyses: a) a file with 100 random post-burnin trees (50% burnin) used in the cophylogenetic analysis - analysisrandom100_trees_brueelia.tre b) a majority rule consensus tree - treeconsensus_tree_brueelia.tre c) a maximum clade credibility tree - mcc_tree_brueelia.tre The tree tips are labeled with louse voucher names, and can be referenced in Supplementary Table 1 of the associated publication. 2) Files related to a BEAST analysis with COI data: a) the XML file used as input for the BEAST run, including model parameters, MCMC chain length, and priors - beast_parameters_coi_brueelia.xml b) a file with 100 random post-burnin trees (10% burnin) from the BEAST posterior distribution of trees; used in OTU analysis - beast_100random_trees_brueelia.tre c) an ultrametric maximum clade credibility tree - mcc_tree_beast_brueelia.tre 3) A maximum clade credibility tree of Brueelia-complex host species generated from a distribution of trees downloaded from https://birdtree.org/subsets/ - mcc_tree_brueelia_hosts.tre 4) Concatenated sequence alignment - concatenated_alignment_brueelia.nex
keywords: bird lice; Brueelia-complex; passerines; multiple sequence alignment; phylogenetic tree; Bayesian phylogenetic analysis; MrBayes; BEAST
published: 2018-01-13
This dataset provides the time series (Aug. - Sep. 2016) data of sun-induced chlorophyll fluorescence, photosynthesis, photosynthetically active radiation, and associated vegetation indices that were collected in a soybean field in the farm of University of Illinois at Urbana and Champaign. Data contain 255 records and 6 variables (PPFD-IN: Photosynthetically active radiation; GPP-Gross Primary Production; SIF: Sun-Induced Fluorescence; NDVI: Normalized Difference Vegetation Index; Rededge: Rededge Index; Redege_NDVI: Rededge Normalized Difference Vegetation Index). The timestamp uses the standard time. Data are available from 8 am to 4 pm (corresponding to 9 am to 5 pm local time) every day.
keywords: sun-induced chlorophyll fluorescence; photosynthesis; soybean
published: 2018-02-22
Datasets used in the study, "OCTAL: Optimal Completion of Gene Trees in Polynomial Time," under review at Algorithms for Molecular Biology. Note: DS_STORE file in 25gen-10M folder can be disregarded.
keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees