Illinois Data Bank

The 16S.B.ALL dataset in 100-HF condition

This upload includes the 16S.B.ALL in 100-HF condition (referred to as 16S.B.ALL-100-HF) used in Experiment 3 of the WITCH paper (currently accepted in principle by the Journal of Computational Biology). 100-HF condition refers to making sequences fragmentary with an average length of 100 bp and a standard deviation of 60 bp. Additionally, we enforced that all fragmentary sequences to have lengths > 50 bp. Thus, the final average length of the fragments is slightly higher than 100 bp (~120 bp).

In this case (i.e., 16S.B.ALL-100-HF), 1,000 sequences with lengths 25% around the median length are retained as "backbone sequences", while the remaining sequences are considered "query sequences" and made fragmentary using the "100-HF" procedure. Backbone sequences are aligned using MAGUS (or we extract their reference alignment). Then, the fragmentary versions of the query sequences are added back to the backbone alignment using either MAGUS+UPP or WITCH.

More details of the tar.gz file are described in README.txt.

Technology and Engineering
MAGUS;UPP;Multiple Sequence Alignment;eHMMs
CC0
U.S. National Science Foundation (NSF)-Grant:2006069
Chengze Shen
475 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-6604429_V1 2022-03-25

2.4 KB File
32.7 MB File

Contact the Research Data Service for help interpreting this log.

Dataset update: {"subject"=>["Life Sciences", "Technology and Engineering"]} 2022-08-08T19:53:35Z
RelatedMaterial update: {"link"=>["", "https://doi.org/10.1089/cmb.2021.0585"], "uri"=>["", "10.1089/cmb.2021.0585"], "uri_type"=>["", "DOI"], "citation"=>["Shen, C., Park, M., and Warnow, T. WITCH: improved multiple sequence alignment through weighted consensus HMM alignment. Accepted by the Journal of Computational Biology on Mar. 2022. (forthcoming)", "Shen, Chengze, Minhyuk Park, and Tandy Warnow. 2022. “WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.” Journal of Computational Biology : A Journal of Computational Molecular Cell Biology, May. doi:10.1089/cmb.2021.0585."]} 2022-05-23T16:27:31Z
RelatedMaterial update: {"link"=>["TBD", ""], "uri"=>[nil, ""], "uri_type"=>[nil, ""], "citation"=>["Shen, C., Park, M., and Warnow, T. WITCH: improved multiple sequence alignment through weighted consensus HMM alignment. Accepted by the Journal of Computational Biology on Mar. 2022.", "Shen, C., Park, M., and Warnow, T. WITCH: improved multiple sequence alignment through weighted consensus HMM alignment. Accepted by the Journal of Computational Biology on Mar. 2022. (forthcoming)"], "datacite_list"=>[nil, "IsSupplementTo"]} 2022-03-28T21:02:16Z
Dataset update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Life Sciences"]} 2022-03-28T21:02:15Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us