Displaying 26 - 42 of 42 in total
Subject Area
Funder
Publication Year
License
Illinois Data Bank Dataset Search Results

Dataset Search Results

published: 2021-06-28
 
This dataset contains 1) the cleaned version of 11 CRW datasets, 2) RNASim10k dataset in high fragmentation and 3) three CRW datasets (16S.3, 16S.T, 16S.B.ALL) in high fragmentation.
keywords: MAGUS;UPP;Multiple Sequence Alignment;PASTA;eHMMs
published: 2016-08-16
 
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords: HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published: 2021-04-30
 
This repository includes scripts and datasets for the paper, "Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy" submitted to Bioinformatics.
keywords: BAli-Phy;Bayesian co-estimation;multiple sequence alignment
published: 2021-01-23
 
Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.
keywords: gene duplication and loss; species-tree inference;
published: 2021-11-19
 
This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.
keywords: ROSE; simulation; fragmentary
published: 2020-07-15
 
This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."
keywords: Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL
published: 2011-09-20
 
This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published: 2012-07-01
 
This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.
published: 2019-07-29
 
Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.
keywords: Gene tree correction; horizontal gene transfer; incomplete lineage sorting
published: 2019-03-19
 
This repository includes scripts and datasets for the paper, "TreeMerge: A new method for improving the scalability of species tree estimation methods." The latest version of TreeMerge can be downloaded from Github (https://github.com/ekmolloy/treemerge).
keywords: divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published: 2023-02-07
 
Data sets from "DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS." It contains trees and sequences simulated with gene duplication and loss under a variety of different conditions. Note: - trees.tar.gz contains the simulated gene-family trees used in our experiments (both true trees from SimPhy as well as trees estimated from alignments). - alignments.tar.gz contains simulated sequence data used for estimating the gene-family trees
keywords: evolution; computational biology; bioinformatics; phylogenetics
published: 2023-04-06
 
This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.
keywords: sequence length heterogeneity;indelible;computational biology;multiple sequence alignment
published: 2021-04-11
 
This dataset contains RNASim1000, Cox1-Het datasets as well as analyses of RNASim1000, Cox1-Het, and 1000M1(HF).
keywords: phylogeny estimation; maximum likelihood; RAxML; IQ-TREE; FastTree; cox1; heterotachy; disjoint tree mergers; Tree of Life
published: 2018-02-22
 
Datasets used in the study, "OCTAL: Optimal Completion of Gene Trees in Polynomial Time," under review at Algorithms for Molecular Biology. Note: DS_STORE file in 25gen-10M folder can be disregarded.
keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees
published: 2017-06-15
 
Datasets used in the study, "Optimal completion of incomplete gene trees in polynomial time using OCTAL," presented at WABI 2017.
keywords: phylogenomics; missing data; coalescent-based species tree estimation; gene trees