Illinois Data Bank Dataset Search Results
Results
published:
2017-09-16
Mirarab, Siavash; Warnow, Tandy
(2017)
This dataset contains the data for 16S and 23S rRNA alignments including their reference trees.
The original alignments are from the Gutell Lab CRW, currently located at https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/.
published:
2009-06-19
Liu, Kevin; Raghavan, Sindhu; Nelesen, Serita; Linder, C. Randall; Warnow, Tandy
(2009)
This dataset contains the data for SATe-I.
SATe-I data was used in the following article:
K. Liu, S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, "Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees," Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009.
published:
2022-06-07
Chu, Gillian; Warnow, Tandy
(2022)
Provides RNASim-VS2 datasets used in Gillian's Master's thesis.
published:
2021-08-24
Zaharias, Paul; Grosshauser, Martin; Warnow, Tandy
(2021)
This repository includes datasets for the paper "Re-evaluating Deep Neural Networks for Phylogeny Estimation: The issue of taxon sampling" accepted for RECOMB2021 and submitted to Journal of Computational Biology.
Each zipped file contains a README.
keywords:
deep neural networks; heterotachy; GHOST; quartet estimation; phylogeny estimation
published:
2021-11-03
Liu, Baqiao; Warnow, Tandy
(2021)
This dataset contains re-estimated gene trees from the ASTRAL-II [1] simulated datasets. The re-estimated variants of the datasets are called MC6H and MC11H -- they are derived from the MC6 and MC11 conditions from the original data (the MC6 and MC11 names are given by ASTRID [2]). The uploaded files contain the sequence alignments (half-length their original alignments), and the re-estimated species trees using FastTree2.
Note:
- "mc6h.tar.gz" and "mc11h.tar.gz" contain the sequence alignments and the re-estimated gene trees for the two conditions
- the sequence alignments are in the format "all-genes.phylip.splitted.[i].half" where i means that this alignment is for the i-th alignment of the original dataset, but truncating the alignment halving its length
- "g1000.trees" under each replicate contains the newline-separated re-estimated gene trees. The gene trees were estimated from the above described alignments using FastTree2 (version 2.1.11) command "FastTree -nt -gtr"
[1]: Mirarab, S., & Warnow, T. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44-i52.
[2]: Vachaspati, P., & Warnow, T. (2015). ASTRID: accurate species trees from internode distances. BMC genomics, 16(10), 1-13.
keywords:
simulated data; ASTRAL; alignments; gene trees
published:
2021-06-28
Shen, Chengze; Zaharias, Paul; Warnow, Tandy
(2021)
This dataset contains 1) the cleaned version of 11 CRW datasets, 2) RNASim10k dataset in high fragmentation and 3) three CRW datasets (16S.3, 16S.T, 16S.B.ALL) in high fragmentation.
keywords:
MAGUS;UPP;Multiple Sequence Alignment;PASTA;eHMMs
published:
2016-08-16
Nguyen, Nam-phuong; Nute, Mike; Mirarab, Siavash; Warnow, Tandy
(2016)
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families
used to build the HMMs and BLAST databases. The file structure is:
./X/Y/initial.fasttree
./X/Y/initial.fasta
where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder
are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the
seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on
the initial.fasta.
The query.tar archive contains the query sequences for each cross-fold set.
The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas,
where Z is the fragment length (1, 0.5, or 0.25). The query files are found
in the splits directory.
[1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords:
HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published:
2021-01-23
Willson, James; Roddur, Mrinmoy; Warnow, Tandy
(2021)
Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.
keywords:
gene duplication and loss; species-tree inference;
published:
2021-04-30
Gupta, Maya; Zaharias, Paul; Warnow, Tandy
(2021)
This repository includes scripts and datasets for the paper, "Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy" submitted to Bioinformatics.
keywords:
BAli-Phy;Bayesian co-estimation;multiple sequence alignment
published:
2021-11-19
Shen, Chengze; Park, Minhyuk; Warnow, Tandy
(2021)
This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have:
1. ROSE-HF.tar.gz
2. ROSE-LF.tar.gz
HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp.
LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp.
The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4.
"ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets.
"ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.
keywords:
ROSE; simulation; fragmentary
published:
2020-07-15
Legried, Brandon; Molloy, Erin K.; Warnow, Tandy; Roch, Sebastien
(2020)
This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."
keywords:
Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL
published:
2025-01-27
Shen, Chengze; Wedell, Eleanor; Pop, Mihai; Warnow, Tandy
(2025)
The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.
keywords:
TIPP3;abundance profile;reference database;taxonomic identification;simulation
published:
2012-07-01
Mirarab, Siavash; Ngyuen, Nam-Phuong; Warnow, Tandy
(2012)
This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.
published:
2019-07-29
Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy
(2019)
Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.
keywords:
Gene tree correction; horizontal gene transfer; incomplete lineage sorting
published:
2019-03-19
Molloy, Erin K.; Warnow, Tandy
(2019)
This repository includes scripts and datasets for the paper, "TreeMerge: A new method for improving the scalability of species tree estimation methods." The latest version of TreeMerge can be downloaded from Github (https://github.com/ekmolloy/treemerge).
keywords:
divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published:
2023-04-06
Warnow, Tandy; Park, Minhyuk
(2023)
This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.
keywords:
sequence length heterogeneity;indelible;computational biology;multiple sequence alignment
published:
2021-04-11
Park, Minhyuk; Zaharias, Paul; Warnow, Tandy
(2021)
This dataset contains RNASim1000, Cox1-Het datasets as well as analyses of RNASim1000, Cox1-Het, and 1000M1(HF).
keywords:
phylogeny estimation; maximum likelihood; RAxML; IQ-TREE; FastTree; cox1; heterotachy; disjoint tree mergers; Tree of Life
published:
2011-09-20
Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San
(2011)
This page provides the data for SuperFine, DACTAL, and BeeTLe publications.
- Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214.
- Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13.
- Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012.
- Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282.
- Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published:
2017-06-15
Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy
(2017)
Datasets used in the study, "Optimal completion of incomplete gene trees in polynomial time using OCTAL," presented at WABI 2017.
keywords:
phylogenomics; missing data; coalescent-based species tree estimation; gene trees
published:
2025-04-21
Shen, Chengze; Wedell, Eleanor; Warnow, Tandy
(2025)
#Overview
These are reference packages for the TIPP3 software for abundance profiling and/or species detection from metagenomic reads (e.g., Illumina, PacBio, Nanopore, etc.). Different refpkg versions are listed.
TIPP3 software: https://github.com/c5shen/TIPP3
#Changelog
V1.2 (`tipp3-refpkg-1-2.zip`)
>>Fixed old typos in the file mapping text.
>>Added new files `taxonomy/species_to_marker.tsv` for new function `run_tipp3.py detection [...parameters]`. Please use the latest release of the TIPP3 software for this new function.
V1 (`tipp3-refpkg.zip`)
>>Initial release of the TIPP3 reference package.
#Usage
1. unzip the file to a local directory (will get a folder named "tipp3-refpkg").
2. use with TIPP3 software: `run_tipp3.py -r [path/to/tipp3-refpkg] [other parameters]`
keywords:
TIPP3; abundance profile; reference database; taxonomic identification
published:
2017-09-19
Nute, Michael; Jed, Chou; Molloy, Erin K.; Warnow, Tandy
(2017)
published:
2018-02-22
Christensen, Sarah; Molloy, Erin K; Vachaspati, Pranjal; Warnow, Tandy
(2018)
Datasets used in the study, "OCTAL: Optimal Completion of Gene Trees in Polynomial Time," under review at Algorithms for Molecular Biology. Note: DS_STORE file in 25gen-10M folder can be disregarded.
keywords:
phylogenomics; missing data; coalescent-based species tree estimation; gene trees