Data from: Supertree-like methods for genome-scale species tree estimation
Dataset Description |
This repository includes scripts and datasets for Chapter 6 of my PhD dissertation, " Supertree-like methods for genome-scale species tree estimation," that had not been published previously. This chapter is based on the article: Molloy, E.K. and Warnow, T. "FastMulRFS: Fast and accurate species tree estimation under generic gene duplication and loss models." Bioinformatics, In press. https://doi.org/10.1093/bioinformatics/btaa444. The results presented in my PhD dissertation differ from those in the Bioinformatics article, because I re-estimated species trees using FastMulRF and MulRF on the same datasets in the original repository (https://doi.org/10.13012/B2IDB-5721322_V1). To re-estimate species trees, (1) a seed was specified when running MulRF, and (2) a different script (specifically preprocess_multrees_v3.py from https://github.com/ekmolloy/fastmulrfs/releases/tag/v1.2.0) was used for preprocessing gene trees (which were then given as input to MulRF and FastMulRFS). Note that this preprocessing script is a re-implementation of the original algorithm for improved speed (a bug fix also was implemented). Finally, it was brought to my attention that the simulation in the Bioinformatics article differs from prior studies, because I scaled the species tree by 10 generations per year (instead of 0.9 years per generation, which is ~1.1 generations per year). I re-simulated datasets (true-trees-with-one-gen-per-year-psize-10000000.tar.gz and true-trees-with-one-gen-per-year-psize-50000000.tar.gz) using 0.9 years per generation to quantify the impact of this parameter change (see my PhD dissertation or the supplementary materials of Bioinformatics article for discussion). |
Subject |
Life Sciences |
Keywords |
Species tree estimation; gene duplication and loss; statistical consistency; MulRF, FastRFS |
License |
CC0 |
Corresponding Creator |
Erin K. Molloy |
Downloaded |
1562 times |
| Version | DOI | Comment | Publication Date |
|---|---|---|---|
| 1 | 10.13012/B2IDB-4004605_V1 | 2020-07-15 |
Contact the Research Data Service for help interpreting this log.
| RelatedMaterial | destroy: {"material_type"=>"Code", "availability"=>nil, "link"=>"https://github.com/ekmolloy/fastmulrfs", "uri"=>"", "uri_type"=>"", "citation"=>"", "dataset_id"=>1394, "selected_type"=>"Code", "datacite_list"=>"", "note"=>nil, "feature"=>nil} | 2025-01-08T23:48:42Z |
| RelatedMaterial | destroy: {"material_type"=>"Dataset", "availability"=>nil, "link"=>"https://doi.org/10.13012/B2IDB-5721322_V1", "uri"=>"", "uri_type"=>"", "citation"=>"Molloy, Erin K.; Warnow, Tandy (2019): Data from: FastMulRFS: Statistically consistent polynomial time species tree estimation under gene duplication. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5721322_V1", "dataset_id"=>1394, "selected_type"=>"Dataset", "datacite_list"=>"", "note"=>nil, "feature"=>nil} | 2025-01-08T23:48:42Z |
| RelatedMaterial | update: {"uri"=>[nil, ""], "uri_type"=>[nil, ""], "datacite_list"=>[nil, ""]} | 2020-11-02T18:08:46Z |
| RelatedMaterial | update: {"uri"=>[nil, ""], "uri_type"=>[nil, ""], "datacite_list"=>[nil, ""]} | 2020-11-02T18:08:46Z |
| Dataset | update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Life Sciences"]} | 2020-11-02T18:08:46Z |