Illinois Data Bank

HIPPI Dataset

This archive contains all the alignments and trees used in the HIPPI paper [1]. The pfam.tar archive contains the PFAM families
used to build the HMMs and BLAST databases. The file structure is:

./X/Y/initial.fasttree
./X/Y/initial.fasta

where X is a Pfam family, Y is the cross-fold set (0, 1, 2, or 3). Inside the folder
are two files, initial.fasta which is the Pfam reference alignment with 1/4 of the
seed alignment removed and initial.fasttree, the FastTree-2 ML tree estimated on
the initial.fasta.

The query.tar archive contains the query sequences for each cross-fold set.

The associated query sequences for a cross-fold Y is labeled as query.Y.Z.fas,
where Z is the fragment length (1, 0.5, or 0.25). The query files are found
in the splits directory.

[1] Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. (2016) HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. To appear in BMC Genomics.

Life Sciences
HIPPI dataset; ensembles of profile Hidden Markov models; Pfam
CC0
U.S. National Science Foundation (NSF)-Grant:DBI-1461364
U.S. National Science Foundation (NSF)-Grant:ABI-1458652
U.S. National Science Foundation (NSF)-Grant:III:AF:1513629
University of Illinois at Urbana-Champaign
Tandy Warnow
859 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-6795126_V1 2016-08-16

521 MB File

Contact the Research Data Service for help interpreting this log.

RelatedMaterial update: {"datacite_list"=>["IsSupplementTo,IsCitedBy", "IsSupplementTo"], "note"=>[nil, ""], "feature"=>[nil, false]} 2023-12-13T19:27:34Z
Dataset update: {"version_comment"=>[nil, ""], "subject"=>[nil, "Life Sciences"]} 2018-02-09T16:04:29Z
RelatedMaterial update: {"citation"=>["Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. 2016. To appear in BMC Genomics.", "Nguyen, Nam-Phuong D, Mike Nute, Siavash Mirarab, and Tandy Warnow. HIPPI: Highly Accurate Protein Family Classification with Ensembles of HMMs. 2016. BMC Genomics. doi:10.1186/s12864-016-3097-0"]} 2016-11-15T20:06:18Z
RelatedMaterial update: {"link"=>["", "http://dx.doi.org/10.1186/s12864-016-3097-0"], "uri"=>["", "10.1186/s12864-016-3097-0"], "uri_type"=>["", "DOI"], "datacite_list"=>["", "IsSupplementTo,IsCitedBy"]} 2016-11-15T14:32:42Z
Creator create: {"family_name"=>"Warnow", "given_name"=>"Tandy", "identifier"=>"", "email"=>"warnow@illinois.edu", "is_contact"=>true, "row_position"=>4} 2016-08-26T15:00:54Z
Creator create: {"family_name"=>"Mirarab", "given_name"=>"Siavash", "identifier"=>"", "email"=>"smirarab@gmail.com", "is_contact"=>false, "row_position"=>3} 2016-08-26T15:00:54Z
Creator create: {"family_name"=>"Nute", "given_name"=>"Mike", "identifier"=>"", "email"=>"nute2@illinois.edu", "is_contact"=>false, "row_position"=>2} 2016-08-26T15:00:54Z
Creator update: {"is_contact"=>[true, false]} 2016-08-26T15:00:54Z
Dataset update: {"corresponding_creator_name"=>["Nam-phuong Nguyen", "Tandy Warnow"], "corresponding_creator_email"=>["namphuon@cs.utah.edu", "warnow@illinois.edu"]} 2016-08-26T15:00:54Z
RelatedMaterial update: {"uri"=>[nil, ""], "uri_type"=>[nil, ""], "datacite_list"=>[nil, ""]} 2016-08-25T20:58:20Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us