Illinois Data Bank

Data from: Supertree-like methods for genome-scale species tree estimation

This repository includes scripts and datasets for Chapter 6 of my PhD dissertation, " Supertree-like methods for genome-scale species tree estimation," that had not been published previously. This chapter is based on the article: Molloy, E.K. and Warnow, T. "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Bioinformatics, In press. https://doi.org/10.1093/bioinformatics/btaa444.

The results presented in my PhD dissertation differ from those in the Bioinformatics article, because I re-estimated species trees using FastMulRF and MulRF on the same datasets in the original repository (https://doi.org/10.13012/B2IDB-5721322_V1). To re-estimate species trees, (1) a seed was specified when running MulRF, and (2) a different script (specifically preprocess_multrees_v3.py from https://github.com/ekmolloy/fastmulrfs/releases/tag/v1.2.0) was used for preprocessing gene trees (which were then given as input to MulRF and FastMulRFS). Note that this preprocessing script is a re-implementation of the original algorithm for improved speed (a bug fix also was implemented).

Finally, it was brought to my attention that the simulation in the Bioinformatics article differs from prior studies, because I scaled the species tree by 10 generations per year (instead of 0.9 years per generation, which is ~1.1 generations per year). I re-simulated datasets (true-trees-with-one-gen-per-year-psize-10000000.tar.gz and true-trees-with-one-gen-per-year-psize-50000000.tar.gz) using 0.9 years per generation to quantify the impact of this parameter change (see my PhD dissertation or the supplementary materials of Bioinformatics article for discussion).

Life Sciences
Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS
CC0
Erin K. Molloy
1562 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-4004605_V1 2020-07-15

2.19 KB File
6.73 MB File
14.8 KB File
866 KB File
898 KB File
8.93 MB File
9.2 MB File
12.7 MB File
12.9 MB File
104 MB File
105 MB File

Contact the Research Data Service for help interpreting this log.

RelatedMaterial destroy: {"material_type"=>"Code", "availability"=>nil, "link"=>"https://github.com/ekmolloy/fastmulrfs", "uri"=>"", "uri_type"=>"", "citation"=>"", "dataset_id"=>1394, "selected_type"=>"Code", "datacite_list"=>"", "note"=>nil, "feature"=>nil} 2025-01-08T23:48:42Z
RelatedMaterial destroy: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-5721322_V1", "uri"=>"", "uri_type"=>"", "citation"=>"Molloy, Erin K.; Warnow, Tandy (2019): Data from: FastMulRFS: Statistically consistent polynomial time species tree estimation under gene duplication. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5721322_V1", "dataset_id"=>1394, "selected_type"=>"Dataset", "datacite_list"=>"", "note"=>nil, "feature"=>nil} 2025-01-08T23:48:42Z
RelatedMaterial update: {"uri"=>[nil, ""], "uri_type"=>[nil, ""], "datacite_list"=>[nil, ""]} 2020-11-02T18:08:46Z
RelatedMaterial update: {"uri"=>[nil, ""], "uri_type"=>[nil, ""], "datacite_list"=>[nil, ""]} 2020-11-02T18:08:46Z
Dataset update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Life Sciences"]} 2020-11-02T18:08:46Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us