RelatedMaterial
|
create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.1186/s13015-023-00247-x", "uri"=>"10.1186/s13015-023-00247-x", "uri_type"=>"DOI", "citation"=>"Shen, C., Liu, B., Williams, K.P. et al. EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment. Algorithms Mol Biol 18, 21 (2023). https://doi.org/10.1186/s13015-023-00247-x", "dataset_id"=>2370, "selected_type"=>"Article", "datacite_list"=>"IsSupplementTo", "note"=>nil, "feature"=>nil}
|
2023-12-13T21:24:35Z
|
Dataset
|
update: {"description"=>["This upload contains all datasets used in Experiment 2 of the EMMA paper (to appear in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.", "This upload contains all datasets used in Experiment 2 of the EMMA paper (appeared in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences."]}
|
2023-09-13T17:29:08Z
|
Creator
|
update: {"given_name"=>["Kelly", "Kelly P."]}
|
2023-09-13T17:23:29Z
|
Dataset
|
update: {"description"=>["This upload contains all datasets used in Experiments 2 and 3 of the EMMA paper (to appear in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.", "This upload contains all datasets used in Experiment 2 of the EMMA paper (to appear in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences."]}
|
2023-09-13T17:23:29Z
|
RelatedMaterial
|
create: {"material_type"=>"Article", "availability"=>nil, "link"=>"https://doi.org/10.1101/2023.06.12.544642", "uri"=>"10.1101/2023.06.12.544642", "uri_type"=>"DOI", "citation"=>"Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow\r\nbioRxiv 2023.06.12.544642; doi: https://doi.org/10.1101/2023.06.12.544642", "dataset_id"=>2370, "selected_type"=>"Article", "datacite_list"=>"IsSupplementTo", "note"=>"", "feature"=>false}
|
2023-08-23T16:33:08Z
|
Dataset
|
update: {"description"=>["This upload contains all datasets used in Experiments 2 and 3 of the EMMA paper (to appear in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"{EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.", "This upload contains all datasets used in Experiments 2 and 3 of the EMMA paper (to appear in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences."]}
|
2023-07-26T19:05:34Z
|
Dataset
|
update: {"title"=>["Datasets for SALMA: Scalable ALignment using MAFFT-add", "Datasets for EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment"], "description"=>["This upload contains all datasets used in Experiments 2 and 3 of the SALMA paper (pending submission): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"SALMA: Scalable ALignment using MAFFT-Add\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences.", "This upload contains all datasets used in Experiments 2 and 3 of the EMMA paper (to appear in WABI 2023): Shen, Chengze, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. \"{EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment\".\r\n\r\nThe zip file has the following structure (presented as an example):\r\nsalma_paper_datasets/\r\n|_README.md\r\n|_10aa/\r\n|_crw/\r\n|_homfam/\r\n |_aat/\r\n | |_...\r\n |_...\r\n|_het/\r\n |_5000M2-het/\r\n | |_...\r\n |_5000M3-het/\r\n ...\r\n|_rec_res/\r\n\r\n\r\nGenerally, the structure can be viewed as:\r\n[category]/[dataset]/[replicate]/[alignment files]\r\n\r\n# Categories:\r\n1. 10aa: There are 10 small biological protein datasets within the `10aa` directory, each with just one replicate.\r\n2. crw: There are 5 selected CRW datasets, namely 5S.3, 5S.E, 5S.T, 16S.3, and 16S.T, each with one replicate. These are the cleaned version from Shen et. al. 2022 (MAGUS+eHMM).\r\n3. homfam: There are the 10 largest Homfam datasets, each with one replicate.\r\n4. het: There are three newly simulated nucleotide datasets from this study, 5000M2-het, 5000M3-het, and 5000M4-het, each with 10 replicates.\r\n5. rec\\_res: It contains the Rec and Res datasets. Detailed dataset generation can be found in the supplementary materials of the paper.\r\n\r\n# Alignment files\r\nThere are at most 6 `.fasta` files in each sub-directory:\r\n1. `all.unaln.fasta`: All unaligned sequences.\r\n2. `all.aln.fasta`: Reference alignments of all sequences. If not all sequences have reference alignments, only the sequences that have will be included.\r\n3. `all-queries.unaln.fasta`: All unaligned query sequences. Query sequences are sequences that do not have lengths within 25% of the median length (i.e., not full-length sequences).\r\n4. `all-queries.aln.fasta`: Reference alignments of query sequences. If not all queries have reference alignments, only the sequences that have will be included.\r\n5. `backbone.unaln.fasta`: All unaligned backbone sequences. Backbone sequences are sequences that have lengths within 25% of the median length (i.e., full-length sequences).\r\n6. `backbone.aln.fasta`: Reference alignments of backbone sequences. If not all backbone sequences have reference alignments, only the sequences that have will be included.\r\n\r\n>If all sequences are full-length sequences, then `all-queries.unaln.fasta` will be missing.\r\n>If fewer than two query sequences have reference alignments, then `all-queries.aln.fasta` will be missing.\r\n>If fewer than two backbone sequences have reference alignments, then `backbone.aln.fasta` will be missing.\r\n\r\n# Additional file(s)\r\n1. `350378genomes.txt`: the file contains all 350,378 bacterial and archaeal genome names that were used by Prodigal (Hyatt et. al. 2010) to search for protein sequences."]}
|
2023-07-26T19:05:17Z
|
Dataset
|
update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Technology and Engineering"]}
|
2022-09-28T17:42:53Z
|