Home
Deposit
Find
Policies
Guides
Contact
Log in
Toggle navigation
Illinois Data Bank
Deposit Dataset
Find Data
Policies
Guides
Contact Us
Log in with NetID
Displaying 26 - 42 of 42 in total
<
1
2
>
25 per page
50 per page
Show All
Go
Clear Filters
Generate Report from Search Results
Subject Area
Life Sciences (21)
Technology and Engineering (19)
Uncategorized
Funder
U.S. National Science Foundation (NSF) (29)
Other (11)
U.S. Department of Energy (DOE) (3)
U.S. National Institutes of Health (NIH) (2)
Publication Year
2021 (9)
2019 (5)
2022 (5)
2023 (5)
2017 (3)
2018 (3)
2025 (3)
2020 (2)
2009 (1)
2011 (1)
2012 (1)
2014 (1)
2015 (1)
2016 (1)
2024 (1)
License
CC0 (36)
CC BY (4)
custom (2)
Illinois Data Bank Dataset Search Results
Dataset Search Results
published: 2021-06-28
Shen, Chengze; Zaharias, Paul; Warnow, Tandy (2021): MAGUS+eHMMs: Improved Multiple Sequence Alignment Accuracy for Fragmentary Sequences. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2419626_V1
This dataset contains 1) the cleaned version of 11 CRW datasets, 2) RNASim10k dataset in high fragmentation and 3) three CRW datasets (16S.3, 16S.T, 16S.B.ALL) in high fragmentation.
keywords:
MAGUS;UPP;Multiple Sequence Alignment;PASTA;eHMMs
published: 2016-08-16
Nguyen, Nam-phuong; Nute, Mike; Mirarab, Siavash; Warnow, Tandy (2016): HIPPI Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6795126_V1
This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families used to build the HMMs and BLAST databases. The file structure is: ./X/Y/initial.fasttree ./X/Y/initial.fasta where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on the initial.fasta. The query.tar archive contains the query sequences for each cross-fold set. The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas, where Z is the fragment length (1, 0.5, or 0.25). The query files are found in the splits directory. [1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.
keywords:
HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
published: 2021-04-30
Gupta, Maya; Zaharias, Paul; Warnow, Tandy (2021): Data from: Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7863273_V1
This repository includes scripts and datasets for the paper, "Accurate Large-scale Phylogeny-Aware Alignment using BAli-Phy" submitted to Bioinformatics.
keywords:
BAli-Phy;Bayesian co-estimation;multiple sequence alignment
published: 2021-01-23
Willson, James; Roddur, Mrinmoy; Warnow, Tandy (2021): Data From: "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2418574_V1
Data sets from "Comparing Methods for Species Tree Estimation With Gene Duplication and Loss." It contains data simulated with gene duplication and loss under a variety of different conditions.
keywords:
gene duplication and loss; species-tree inference;
published: 2021-11-19
Shen, Chengze; Park, Minhyuk; Warnow, Tandy (2021): Seven ROSE datasets in high and low fragmentation conditions. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128941_V1
This is a general description of the datasets included in this upload; details of each dataset can be found in the individual README.txt in each compressed folder. We have: 1. ROSE-HF.tar.gz 2. ROSE-LF.tar.gz HF (high fragmentary): 50% of the sequences are made fragmentary, which have average lengths of 25% of the original lengths with a standard deviation of 60 bp. LF (low fragmentary): 25% of the sequences are made fragmentary, which have average lengths of 50% of the original lengths with a standard deviation of 60 bp. The seven ROSE datasets made fragmentary are: 1000L1, 1000L3, 1000L4, 1000M3, 1000S1, 1000S2 and 1000S4. "ROSE-HF.tar.gz" contains HF versions of the seven ROSE datasets. "ROSE-LF.tar.gz" contains LF versions of the seven ROSE datasets.
keywords:
ROSE; simulation; fragmentary
published: 2020-07-15
Legried, Brandon; Molloy, Erin K.; Warnow, Tandy; Roch, Sebastien (2020): Data from: Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2626814_V3
This repository includes scripts and datasets for the paper, "Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss."
keywords:
Species tree estimation; gene duplication and loss; identifiability; statistical consistency; quartets; ASTRAL
published: 2011-09-20
Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy; Nguyen, Nam-puhong; Mirarab, Siavash; Neves, Diogo Telmo; Sobral, João Luís; Pingali, Keshav; Nelesen, Serita; Liu, Kevin; Wang, Li-San (2011): Data for SuperFine, DACTAL, and BeeTLe. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-2952208_V1
This page provides the data for SuperFine, DACTAL, and BeeTLe publications. - Swenson, M. Shel, et al. "SuperFine: fast and accurate supertree estimation." Systematic biology 61.2 (2012): 214. - Nguyen, Nam, Siavash Mirarab, and Tandy Warnow. "MRL and SuperFine+ MRL: new supertree methods." Algorithms for Molecular Biology 7 (2012): 1-13. - Neves, Diogo Telmo, et al. "Parallelizing superfine." Proceedings of the 27th Annual ACM Symposium on Applied Computing. 2012. - Nelesen, Serita, et al. "DACTAL: divide-and-conquer trees (almost) without alignments." Bioinformatics 28.12 (2012): i274-i282. - Liu, Kevin, and Tandy Warnow. "Treelength optimization for phylogeny estimation." PLoS One 7.3 (2012): e33104.
published: 2012-07-01
Mirarab, Siavash; Ngyuen, Nam-Phuong; Warnow, Tandy (2012): Data for SEPP: SATé-Enabled Phylogenetic Placement.. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9316702_V1
This dataset provides the data for Mirarab, Siavash, Nam Nguyen, and Tandy Warnow. "SEPP: SATé-enabled phylogenetic placement." Biocomputing 2012. 2012. 247-258.
published: 2019-07-29
Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2019): Data from TRACTION: Fast non-parametric improvement of estimated gene trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1747658_V1
Datasets used in the study, "TRACTION: Fast non-parametric improvement of estimated gene trees," accepted at the Workshop on Algorithms in Bioinformatics (WABI) 2019.
keywords:
Gene tree correction; horizontal gene transfer; incomplete lineage sorting
published: 2019-03-19
Molloy, Erin K.; Warnow, Tandy (2019): Data from: TreeMerge: A new method for improving the scalability of species tree estimation methods. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9570561_V1
This repository includes scripts and datasets for the paper, "TreeMerge: A new method for improving the scalability of species tree estimation methods." The latest version of TreeMerge can be downloaded from Github (https://github.com/ekmolloy/treemerge).
keywords:
divide-and-conquer; statistical consistency; species trees; incomplete lineage sorting; phylogenomics
published: 2023-02-07
Willson, James; Tabatabaee, Yasamin; Liu, Baqiao; Warnow, Tandy (2023): Data from: DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5748609_V1
Data sets from "DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS." It contains trees and sequences simulated with gene duplication and loss under a variety of different conditions. Note: - trees.tar.gz contains the simulated gene-family trees used in our experiments (both true trees from SimPhy as well as trees estimated from alignments). - alignments.tar.gz contains simulated sequence data used for estimating the gene-family trees
keywords:
evolution; computational biology; bioinformatics; phylogenetics
published: 2023-04-06
Warnow, Tandy; Park, Minhyuk (2023): INDELible simulated datesets with sequence length heterogeneity. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-0900513_V1
This is a simulated sequence dataset generated using INDELible and processed via a sequence fragmentation procedure.
keywords:
sequence length heterogeneity;indelible;computational biology;multiple sequence alignment
published: 2021-04-11
Park, Minhyuk; Zaharias, Paul; Warnow, Tandy (2021): Disjoint Tree Mergers for Large-Scale Maximum LikelihoodTree Estimation. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7008049_V1
This dataset contains RNASim1000, Cox1-Het datasets as well as analyses of RNASim1000, Cox1-Het, and 1000M1(HF).
keywords:
phylogeny estimation; maximum likelihood; RAxML; IQ-TREE; FastTree; cox1; heterotachy; disjoint tree mergers; Tree of Life
published: 2017-09-19
Nute, Michael; Jed, Chou; Molloy, Erin K.; Warnow, Tandy (2017): Data from: The Performance of Coalescent-Based Species Tree Estimation Methods under Models of Missing Data. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-7735354_V1
published: 2018-04-06
Collins, Kodi; Warnow, Tandy (2018): PASTA For Proteins Data (BALiBASE). University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4074787_V1
keywords:
protein; multiple sequence alignment; balibase
published: 2018-02-22
Christensen, Sarah; Molloy, Erin K; Vachaspati, Pranjal; Warnow, Tandy (2018): Datasets from the study "OCTAL: Optimal Completion of Gene Trees in Polynomial Time". University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1616387_V1
Datasets used in the study, "OCTAL: Optimal Completion of Gene Trees in Polynomial Time," under review at Algorithms for Molecular Biology. Note: DS_STORE file in 25gen-10M folder can be disregarded.
keywords:
phylogenomics; missing data; coalescent-based species tree estimation; gene trees
published: 2017-06-15
Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy (2017): Datasets from the study: Optimal completion of incomplete gene trees in polynomial time using OCTAL. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-8402610_V1
Datasets used in the study, "Optimal completion of incomplete gene trees in polynomial time using OCTAL," presented at WABI 2017.
keywords:
phylogenomics; missing data; coalescent-based species tree estimation; gene trees