Dataset Search

Displaying 26 - 47 of 47 in total

Filters

Subject Area

Technology and Engineering (26)

Life Sciences (21)

Funder

U.S. National Science Foundation (NSF) (30)

Other (12)

U.S. Department of Energy (DOE) (3)

U.S. National Institutes of Health (NIH) (2)

Publication Year

2021 (9)

2025 (7)

2019 (5)

2022 (5)

2023 (5)

2017 (3)

2018 (3)

2020 (2)

2009 (1)

2011 (1)

2012 (1)

2014 (1)

2015 (1)

2016 (1)

2024 (1)

2026 (1)

License

CC0 (39)

CC BY (6)

custom (2)

Illinois Data Bank Dataset Search Results

Results

published: 2017-09-16

Data for 16S and 23S rRNA alignments

Mirarab, Siavash; Warnow, Tandy (2017)

This dataset contains the data for 16S and 23S rRNA alignments including their reference trees. The original alignments are from the Gutell Lab CRW, currently located at https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/.

published: 2009-06-19

Data for Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees

Liu, Kevin; Raghavan, Sindhu; Nelesen, Serita; Linder, C. Randall; Warnow, Tandy (2009)

This dataset contains the data for SATe-I. SATe-I data was used in the following article: K. Liu, S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.

published: 2022-06-07

RNASim-VS2

Chu, Gillian; Warnow, Tandy (2022)

Provides RNASim-VS2 datasets used in Gillian's Master's thesis.

published: 2021-08-24

Data from "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling"

Zaharias, Paul; Grosshauser, Martin; Warnow, Tandy (2021)

This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology. Each zipped file contains a README.

keywords: deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation

published: 2021-11-03

Data from Scalable Species Tree Inference with External Constraints

Liu, Baqiao; Warnow, Tandy (2021)

This dataset contains re-estimated gene trees from the ASTRAL-II [1] simulated datasets. The re-estimated variants of the datasets are called MC6H and MC11H -- they are derived from the MC6 and MC11 conditions from the original data (the MC6 and MC11 names are given by ASTRID [2]). The uploaded files contain the sequence alignments (half-length their original alignments), and the re-estimated species trees using FastTree2. Note: - "mc6h.tar.gz" and "mc11h.tar.gz" contain the sequence alignments and the re-estimated gene trees for the two conditions - the sequence alignments are in the format "all-genes.phylip.splitted.[i].half" where i means that this alignment is for the i-th alignment of the original dataset, but truncating the alignment halving its length - "g1000.trees" under each replicate contains the newline-separated re-estimated gene trees. The gene trees were estimated from the above described alignments using FastTree2 (version 2.1.11) command "FastTree -nt -gtr" [1]: Mirarab, S., & Warnow, T. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44-i52. [2]: Vachaspati, P., & Warnow, T. (2015). ASTRID: accurate species trees from internode distances. BMC genomics, 16(10), 1-13.

keywords: simulated data; ASTRAL; alignments; gene trees

published: 2021-06-28

MAGUS+eHMMs: Improved Multiple Sequence Alignment Accuracy for Fragmentary Sequences

Shen, Chengze; Zaharias, Paul; Warnow, Tandy (2021)

This dataset contains 1) the cleaned version of 11 CRW datasets, 2) RNASim10k dataset in high fragmentation and 3) three CRW datasets (16S.3, 16S.T, 16S.B.ALL) in high fragmentation.

keywords: MAGUS;UPP;Multiple Sequence Alignment;PASTA;eHMMs

published: 2016-08-16

HIPPI Dataset

Nguyen, Nam-phuong; Nute, Mike; Mirarab, Siavash; Warnow, Tandy (2016)

This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.

keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam

published: 2021-01-23

Data From: "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss"

Willson, James; Roddur, Mrinmoy; Warnow, Tandy (2021)

Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.

keywords: gene duplication and loss; species-tree inference;

published: 2021-04-30

Data from: Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy

Gupta, Maya; Zaharias, Paul; Warnow, Tandy (2021)

This repository includes scripts and datasets for the paper, "Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy" submitted to Bioinformatics.

keywords: BAli-Phy;Bayesian co-estimation;multiple sequence alignment

published: 2021-11-19

Seven ROSE datasets in high and low fragmentation conditions

Shen, Chengze; Park, Minhyuk; Warnow, Tandy (2021)

This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.

keywords: ROSE; simulation; fragmentary

published: 2020-07-15

Data from: Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss

Legried, Brandon; Molloy, Erin K.; Warnow, Tandy; Roch, Sebastien (2020)

This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."

keywords: Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL

published: 2025-01-27

TIPP3 Benchmark Data and Simulated Reads

Shen, Chengze; Wedell, Eleanor; Pop, Mihai; Warnow, Tandy (2025)

The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.

keywords: TIPP3;abundance profile;reference database;taxonomic identification;simulation

published: 2012-07-01

Data for SEPP: SATé-Enabled Phylogenetic Placement.

Mirarab, Siavash; Ngyuen, Nam-Phuong; Warnow, Tandy (2012)

This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.

published: 2019-07-29

Data from TRACTION: Fast non-parametric improvement of estimated gene trees

Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2019)

Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.

keywords: Gene tree correction; horizontal gene transfer; incomplete lineage sorting

published: 2019-03-19

Data from: TreeMerge: A new method for improving the scalability of species tree estimation methods

Molloy, Erin K.; Warnow, Tandy (2019)

This repository includes scripts and datasets for the paper, "TreeMerge: A new method for improving the scalability of species tree estimation methods." The latest version of TreeMerge can be downloaded from Github (https://github.com/ekmolloy/treemerge).

keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics

published: 2023-04-06

INDELible simulated datesets with sequence length heterogeneity

Warnow, Tandy; Park, Minhyuk (2023)

This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.

keywords: sequence length heterogeneity;indelible;computational biology;multiple sequence alignment

published: 2021-04-11

Disjoint Tree Mergers for Large-Scale Maximum LikelihoodTree Estimation

Park, Minhyuk; Zaharias, Paul; Warnow, Tandy (2021)

This dataset contains RNASim1000, Cox1-Het datasets as well as analyses of RNASim1000, Cox1-Het, and 1000M1(HF).

keywords: phylogeny estimation; maximum likelihood; RAxML; IQ-TREE; FastTree; cox1; heterotachy; disjoint tree mergers; Tree of Life

published: 2011-09-20

Data for SuperFine, DACTAL, and BeeTLe

Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San (2011)

This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.

published: 2017-06-15

Datasets from the study: Optimal completion of incomplete gene trees in polynomial time using OCTAL

Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2017)

Datasets used in the study, "Optimal completion of incomplete gene trees in polynomial time using OCTAL," presented at WABI 2017.

keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees

published: 2025-04-21

TIPP3 Reference Package for Abundance Profiling

Shen, Chengze; Wedell, Eleanor; Warnow, Tandy (2025)

#Overview These are reference packages for the TIPP3 software for abundance profiling and/or species detection from metagenomic reads (e.g., Illumina, PacBio, Nanopore, etc.). Different refpkg versions are listed. TIPP3 software: https://github.com/c5shen/TIPP3 #Changelog V1.2 (`tipp3-refpkg-1-2.zip`) >>Fixed old typos in the file mapping text. >>Added new files `taxonomy/species_to_marker.tsv` for new function `run_tipp3.py detection [...parameters]`. Please use the latest release of the TIPP3 software for this new function. V1 (`tipp3-refpkg.zip`) >>Initial release of the TIPP3 reference package. #Usage 1. unzip the file to a local directory (will get a folder named "tipp3-refpkg"). 2. use with TIPP3 software: `run_tipp3.py -r [path/to/tipp3-refpkg] [other parameters]`

keywords: TIPP3; abundance profile; reference database; taxonomic identification

published: 2017-09-19

Data from: The Performance of Coalescent-Based Species Tree Estimation Methods under Models of Missing Data

Nute, Michael; Jed, Chou; Molloy, Erin K.; Warnow, Tandy (2017)

published: 2018-02-22

Datasets from the study "OCTAL: Optimal Completion of Gene Trees in Polynomial Time"

Christensen, Sarah; Molloy, Erin K; Vachaspati, Pranjal; Warnow, Tandy (2018)

Datasets used in the study, "OCTAL: Optimal Completion of Gene Trees in Polynomial Time," under review at Algorithms for Molecular Biology. Note: DS_STORE file in 25gen-10M folder can be disregarded.

keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees