The 16S.B.ALL dataset in 100-HF condition
Dataset Description |
This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp). In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH. More details of the tar.gz file are described in README.txt. |
Subject |
Technology and Engineering |
Keywords |
MAGUS;UPP;Multiple Sequence Alignment;eHMMs |
License |
CC0 |
Funder |
U.S. National Science Foundation (NSF)-Grant:2006069 |
Corresponding Creator |
Chengze Shen |
Downloaded |
475 times |
| Version | DOI | Comment | Publication Date |
|---|---|---|---|
| 1 | 10.13012/B2IDB-6604429_V1 | 2022-03-25 |
Contact the Research Data Service for help interpreting this log.
| Dataset | update: {"subject"=>["Life Sciences", "Technology and Engineering"]} | 2022-08-08T19:53:35Z |
| RelatedMaterial | update: {"link"=>["", "https://doi.org/10.1089/cmb.2021.0585"], "uri"=>["", "10.1089/cmb.2021.0585"], "uri_type"=>["", "DOI"], "citation"=>["Shen, C., Park, M., and Warnow, T. WITCH: improved multiple sequence alignment through weighted consensus HMM alignment. Accepted by the Journal of Computational Biology on Mar. 2022. (forthcoming)", "Shen, Chengze, Minhyuk Park, and Tandy Warnow. 2022. “WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.” Journal of Computational Biology : A Journal of Computational Molecular Cell Biology, May. doi:10.1089/cmb.2021.0585."]} | 2022-05-23T16:27:31Z |
| RelatedMaterial | update: {"link"=>["TBD", ""], "uri"=>[nil, ""], "uri_type"=>[nil, ""], "citation"=>["Shen, C., Park, M., and Warnow, T. WITCH: improved multiple sequence alignment through weighted consensus HMM alignment. Accepted by the Journal of Computational Biology on Mar. 2022.", "Shen, C., Park, M., and Warnow, T. WITCH: improved multiple sequence alignment through weighted consensus HMM alignment. Accepted by the Journal of Computational Biology on Mar. 2022. (forthcoming)"], "datacite_list"=>[nil, "IsSupplementTo"]} | 2022-03-28T21:02:16Z |
| Dataset | update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Life Sciences"]} | 2022-03-28T21:02:15Z |