NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Molloy, Erin K.; Warnow, Tandy

doi:10.13012/B2IDB-1424746_V1

Illinois Data Bank - Dataset

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Cite this dataset:

Molloy, Erin K.; Warnow, Tandy (2018): NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1424746_V1

Use this persistent URL to link to this dataset:


Dataset Description	This repository includes scripts, datasets, and supplementary materials for the study, "NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge. *When downloading datasets, please note that the following errors.* In README.txt, lines 37 and 38 should read: + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre Note that the file names (fasttree-exon.tre and fasttree-intron.tre) are swapped. In tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the "symmetric difference error rate" as the "Robinson-Foulds error rate". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative. In njmerge-supplementary-materials.pdf, the alpha parameter shown in Supplementary Table S2 is actually the divisor D, which is used to compute alpha for each gene as follows. 1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution. 2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2). Note that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.
Subject	Life Sciences
Keywords	phylogenomics; species trees; incomplete lineage sorting; divide-and-conquer
License	CC0
Funder	U.S. National Science Foundation (NSF)-Grant:CCF-1535977
Funder	U.S. National Science Foundation (NSF)-Grant:DGE-1144245
Corresponding Creator	Tandy Warnow
Downloaded	6381 times
Related Materials (1) Conference paper Molloy E.K., Warnow T. (2018) NJMerge: A Generic Technique for Scaling Phylogeny Estimation Methods and Its Application to Species Trees. In: Blanchette M., Ouangraoua A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science, vol 11183. Springer, Cham

Versions

Version	DOI	Comment	Publication Date
1	10.13012/B2IDB-1424746_V1		2018-07-29

Change Log

Contact the Research Data Service for help interpreting this log.

Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nIn Supplementary Table S2 (njmerge-supplementary-materials.pdf), the alpha parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn README.txt, lines 37 and 38 should read:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\nNote that the file names (fasttree-exon.tre and fasttree-intron.tre) are swapped.\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nIn njmerge-supplementary-materials.pdf, the alpha parameter shown in Supplementary Table S2 is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-21T03:23:26Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nIn Supplementary Table S2 (njmerge-supplementary-materials.pdf), the alpha parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-21T03:18:43Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene, a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of the uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-21T03:16:43Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the normalized symmetric difference is always greater than or equal to the normalized Robinson-Foulds distance, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-21T03:15:14Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This could impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-21T03:14:18Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the \"symmetric difference error rate\" as the \"Robinson-Foulds error rate\". Because the normalized symmetric difference and the normalized Robinson-Foulds distance are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-21T03:13:42Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution .\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution.\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-20T16:20:56Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.\r\n\r\nFinally, in the supplement, we refer to alpha in Table S2; however, this parameter is actually the divisor D, which is used to compute alpha for each gene as follows.\r\n1. For each gene a random value X between 0 and 1 is drawn from a uniform distribution .\r\n2. Alpha is computed as -log(X) / D, where D is 4.2 for exons, 1.0 for UCEs, and 0.4 for introns (as stated in Table S2).\r\nNote that because the mean of a uniform distribution (between 0 and 1) is 0.5, the mean alpha value is -log(0.5) / 4.2 = 0.16 for exons, -log(0.5) / 1.0 = 0.69 for UCEs, and -log(0.5) / 0.4 = 1.73 for introns."]}	2019-04-20T16:14:57Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py script incorrectly refers to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py and the compare_tree_lists.py scripts incorrectly refer to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative."]}	2019-03-12T15:10:42Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py script incorrectly refers to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This might impact the gene tree error rates reported in the study, as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative.", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\nWhen downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py script incorrectly refers to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This can impact the gene tree error rates reported in the study (see data-gene-trees.csv in data.zip), as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative."]}	2019-03-11T12:41:29Z
Dataset	update: {"description"=>["This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge. When downloading datasets, please note that there is an error in the README; the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge.\r\n\r\n*When downloading datasets, please note that the following errors.*\r\n\r\nIn the README, the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre\r\n\r\nIn tools.zip, the compare_trees.py script incorrectly refers to the symmetric difference rate as the Robinson-Foulds error rate. Because the symmetric difference rate and the Robinson-Foulds error rate are equal for binary trees, this does not impact the species tree error rates reported in the study. This might impact the gene tree error rates reported in the study, as FastTree-2 returns trees with polytomies whenever 3 or more sequences in the input alignment are identical. Note that the symmetric difference rate is always greater than or equal to the Robinson-Foulds error rate, so the gene tree error rates reported in the study are more conservative."]}	2019-03-11T12:37:29Z
Funder	create: {"name"=>"U.S. National Science Foundation (NSF)", "identifier"=>"10.13039/100000001", "identifier_scheme"=>"DOI", "grant"=>"DGE-1144245", "dataset_id"=>628, "code"=>"NSF"}	2019-02-07T17:47:44Z
Funder	create: {"name"=>"U.S. National Science Foundation (NSF)", "identifier"=>"10.13039/100000001", "identifier_scheme"=>"DOI", "grant"=>"CCF-1535977", "dataset_id"=>628, "code"=>"NSF"}	2019-02-07T17:47:44Z
RelatedMaterial	create: {"material_type"=>"Conference paper", "availability"=>nil, "link"=>"https://doi.org/10.1007/978-3-030-00834-5_15", "uri"=>"10.1007/978-3-030-00834-5_15", "uri_type"=>"DOI", "citation"=>"Molloy E.K., Warnow T. (2018) NJMerge: A Generic Technique for Scaling Phylogeny Estimation Methods and Its Application to Species Trees. In: Blanchette M., Ouangraoua A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science, vol 11183. Springer, Cham", "dataset_id"=>628, "selected_type"=>"Other", "datacite_list"=>"IsSupplementTo"}	2018-10-09T18:23:17Z
Dataset	update: {"keywords"=>["phylogenomics, species trees, incomplete lineage sorting, divide-and-conquer", "phylogenomics; species trees; incomplete lineage sorting; divide-and-conquer"], "version_comment"=>[nil, ""], "subject"=>[nil, "Life Sciences"]}	2018-07-30T21:44:20Z
Dataset	update: {"description"=>["", "This repository includes scripts, datasets, and supplementary materials for the study, \"NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees\", presented at RECOMB-CG 2018. The supplementary figures and tables referenced in the main paper can be found in njmerge-supplementary-materials.pdf. The latest version of NJMerge can be downloaded from Github: https://github.com/ekmolloy/njmerge. When downloading datasets, please note that there is an error in the README; the file names on lines 37/38 should be switched so that the README reads:\r\n + fasttree-exon.tre contains lines 1-25, 1-100, or 1-1000 of fasttree-total.tre\r\n + fasttree-intron.tre contains lines 26-50, 101-200, or 1001-2000 of fasttree-total.tre"]}	2018-07-30T17:05:25Z


Select all Open in Globus what's this?
README.txt 3.85 KB File
alignments-1000tax-10M-01-05.tar.gz 2.05 GB File
alignments-1000tax-10M-06-09.tar.gz 1.92 GB File
alignments-1000tax-10M-10-10.tar.gz 528 MB File
alignments-1000tax-10M-11-15.tar.gz 2.11 GB File
alignments-1000tax-10M-16-19.tar.gz 1.77 GB File
alignments-1000tax-10M-20-20.tar.gz 530 MB File
alignments-1000tax-500K-01-05.tar.gz 945 MB File
alignments-1000tax-500K-06-09.tar.gz 580 MB File
alignments-1000tax-500K-10-10.tar.gz 306 MB File
alignments-1000tax-500K-11-15.tar.gz 1.28 GB File
alignments-1000tax-500K-16-19.tar.gz 1.07 GB File
alignments-1000tax-500K-20-20.tar.gz 211 MB File
alignments-100tax-10M-01-20.tar.gz 859 MB File
alignments-100tax-500K-01-20.tar.gz 432 MB File
astral-trees.tar.gz 5.67 MB File
data.zip 4.03 MB File
gene-trees-1000tax-10M-01-05.tar.gz 1.01 GB File
gene-trees-1000tax-10M-06-09.tar.gz 864 MB File
gene-trees-1000tax-10M-10-10.tar.gz 217 MB File
gene-trees-1000tax-10M-11-15.tar.gz 1.04 GB File
gene-trees-1000tax-10M-16-19.tar.gz 835 MB File
gene-trees-1000tax-10M-20-20.tar.gz 219 MB File
gene-trees-1000tax-500K-01-05.tar.gz 918 MB File
gene-trees-1000tax-500K-06-09.tar.gz 703 MB File
gene-trees-1000tax-500K-10-10.tar.gz 205 MB File
gene-trees-1000tax-500K-11-15.tar.gz 992 MB File
gene-trees-1000tax-500K-16-19.tar.gz 796 MB File
gene-trees-1000tax-500K-20-20.tar.gz 194 MB File
gene-trees-100tax-10M-01-20.tar.gz 543 MB File
gene-trees-100tax-500K-01-20.tar.gz 496 MB File
njmerge-supplementary-materials.pdf 234 KB View File
paup-svdq-trees.tar.gz 4.48 MB File
raxml-caml-trees.tar.gz 1.14 MB File
scripts.zip 717 KB File
tools.zip 23.4 KB File
true-constraint-trees.tar.gz 1.67 MB File