Displaying 26 - 42 of 42 in total

Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2021-06-28

Shen, Chengze; Zaharias, Paul; Warnow, Tandy (2021): MAGUS+eHMMs: Improved Multiple Sequence Alignment Accuracy for Fragmentary Sequences. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2419626_V1

This dataset contains 1) the cleaned version of 11 CRW datasets, 2) RNASim10k dataset in high fragmentation and 3) three CRW datasets (16S.3, 16S.T, 16S.B.ALL) in high fragmentation.

keywords: MAGUS;UPP;Multiple Sequence Alignment;PASTA;eHMMs

published: 2016-08-16

Nguyen, Nam-phuong; Nute, Mike; Mirarab, Siavash; Warnow, Tandy (2016): HIPPI Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6795126_V1

This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.

keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam

published: 2021-04-30

Gupta, Maya; Zaharias, Paul; Warnow, Tandy (2021): Data from: Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7863273_V1

This repository includes scripts and datasets for the paper, "Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy" submitted to Bioinformatics.

keywords: BAli-Phy;Bayesian co-estimation;multiple sequence alignment

published: 2021-01-23

Willson, James; Roddur, Mrinmoy; Warnow, Tandy (2021): Data From: "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2418574_V1

Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.

keywords: gene duplication and loss; species-tree inference;

published: 2021-11-19

Shen, Chengze; Park, Minhyuk; Warnow, Tandy (2021): Seven ROSE datasets in high and low fragmentation conditions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128941_V1

This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.

keywords: ROSE; simulation; fragmentary

published: 2020-07-15

Legried, Brandon; Molloy, Erin K.; Warnow, Tandy; Roch, Sebastien (2020): Data from: Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2626814_V3

This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."

keywords: Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL

published: 2011-09-20

Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San (2011): Data for SuperFine, DACTAL, and BeeTLe. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2952208_V1

This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.

published: 2012-07-01

Mirarab, Siavash; Ngyuen, Nam-Phuong; Warnow, Tandy (2012): Data for SEPP: SATé-Enabled Phylogenetic Placement.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9316702_V1

This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.

published: 2019-07-29

Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2019): Data from TRACTION: Fast non-parametric improvement of estimated gene trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1747658_V1

Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.

keywords: Gene tree correction; horizontal gene transfer; incomplete lineage sorting

published: 2019-03-19

Molloy, Erin K.; Warnow, Tandy (2019): Data from: TreeMerge: A new method for improving the scalability of species tree estimation methods. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9570561_V1

This repository includes scripts and datasets for the paper, "TreeMerge: A new method for improving the scalability of species tree estimation methods." The latest version of TreeMerge can be downloaded from Github (https://github.com/ekmolloy/treemerge).

keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics

published: 2023-02-07

Willson, James; Tabatabaee, Yasamin; Liu, Baqiao; Warnow, Tandy (2023): Data from: DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5748609_V1

Data sets from "DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS." It contains trees and sequences simulated with gene duplication and loss under a variety of different conditions. Note: - trees.tar.gz contains the simulated gene-family trees used in our experiments (both true trees from SimPhy as well as trees estimated from alignments). - alignments.tar.gz contains simulated sequence data used for estimating the gene-family trees

keywords: evolution; computational biology; bioinformatics; phylogenetics

published: 2023-04-06

Warnow, Tandy; Park, Minhyuk (2023): INDELible simulated datesets with sequence length heterogeneity. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0900513_V1

This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.

keywords: sequence length heterogeneity;indelible;computational biology;multiple sequence alignment

published: 2021-04-11

Park, Minhyuk; Zaharias, Paul; Warnow, Tandy (2021): Disjoint Tree Mergers for Large-Scale Maximum LikelihoodTree Estimation. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7008049_V1

This dataset contains RNASim1000, Cox1-Het datasets as well as analyses of RNASim1000, Cox1-Het, and 1000M1(HF).

keywords: phylogeny estimation; maximum likelihood; RAxML; IQ-TREE; FastTree; cox1; heterotachy; disjoint tree mergers; Tree of Life

published: 2017-09-19

Nute, Michael; Jed, Chou; Molloy, Erin K.; Warnow, Tandy (2017): Data from: The Performance of Coalescent-Based Species Tree Estimation Methods under Models of Missing Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7735354_V1

published: 2018-04-06

Collins, Kodi; Warnow, Tandy (2018): PASTA For Proteins Data (BALiBASE). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4074787_V1

keywords: protein; multiple sequence alignment; balibase

published: 2018-02-22

Christensen, Sarah; Molloy, Erin K; Vachaspati, Pranjal; Warnow, Tandy (2018): Datasets from the study "OCTAL: Optimal Completion of Gene Trees in Polynomial Time". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1616387_V1

Datasets used in the study, "OCTAL: Optimal Completion of Gene Trees in Polynomial Time," under review at Algorithms for Molecular Biology. Note: DS_STORE file in 25gen-10M folder can be disregarded.

keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees

published: 2017-06-15

Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2017): Datasets from the study: Optimal completion of incomplete gene trees in polynomial time using OCTAL. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8402610_V1

Datasets used in the study, "Optimal completion of incomplete gene trees in polynomial time using OCTAL," presented at WABI 2017.

keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees