fbpx
Wikipedia

HMMER

HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy.[2] Its general usage is to identify homologous protein or nucleotide sequences, and to perform sequence alignments. It detects homology by comparing a profile-HMM (a Hidden Markov model constructed explicitly for a particular search) to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program. The profile-HMM implementation used in the HMMER software was based on the work of Krogh and colleagues.[3] HMMER is a console utility ported to every major operating system, including different versions of Linux, Windows, and Mac OS.

HMMER
Developer(s)Sean Eddy, Travis Wheeler, HMMER development team
Stable release
3.3.2[1] / 27 November 2020; 2 years ago (27 November 2020)
Repository
  • github.com/EddyRivasLab/hmmer
Written inC
Available inEnglish
TypeBioinformatics tool
LicenseBSD-3
Websitehmmer.org
A profile HMM modelling a multiple sequence alignment

HMMER is the core utility that protein family databases such as Pfam and InterPro are based upon. Some other bioinformatics tools such as UGENE also use HMMER.

HMMER3 also makes extensive use of vector instructions for increasing computational speed. This work is based upon earlier publication showing a significant acceleration of the Smith-Waterman algorithm for aligning two sequences.[4]

Profile HMMs

A profile HMM is a variant of an HMM relating specifically to biological sequences. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system, which can be used to align sequences and search databases for remotely homologous sequences.[5] They capitalise on the fact that certain positions in a sequence alignment tend to have biases in which residues are most likely to occur, and are likely to differ in their probability of containing an insertion or a deletion. Capturing this information gives them a better ability to detect true homologs than traditional BLAST-based approaches, which penalise substitutions, insertions and deletions equally, regardless of where in an alignment they occur.[6]

 
The core profile HMM architecture used by HMMER.

Profile HMMs center around a linear set of match (M) states, with one state corresponding to each consensus column in a sequence alignment. Each M state emits a single residue (amino acid or nucleotide). The probability of emitting a particular residue is determined largely by the frequency at which that residue has been observed in that column of the alignment, but also incorporates prior information on patterns of residues that tend to co-occur in the same columns of sequence alignments. This string of match states emitting amino acids at particular frequencies are analogous to position specific score matrices or weight matrices.[5]

A profile HMM takes this modelling of sequence alignments further by modelling insertions and deletions, using I and D states, respectively. D states do not emit a residue, while I states do emit a residue. Multiple I states can occur consecutively, corresponding to multiple residues between consensus columns in an alignment. M, I and D states are connected by state transition probabilities, which also vary by position in the sequence alignment, to reflect the different frequencies of insertions and deletions across sequence alignments.[5]

The HMMER2 and HMMER3 releases used an architecture for building profile HMMs called the Plan 7 architecture, named after the seven states captured by the model. In addition to the three major states (M, I and D), six additional states capture non-homologous flanking sequence in the alignment. These 6 states collectively are important for controlling how sequences are aligned to the model e.g. whether a sequence can have multiple consecutive hits to the same model (in the case of sequences with multiple instances of the same domain).[7]

Programs in the HMMER package

The HMMER package consists of a collection of programs for performing functions using profile hidden Markov models.[8] The programs include:

Profile HMM building

  • hmmbuild - construct profile HMMs from multiple sequence alignments

Homology searching

  • hmmscan - search protein sequences against a profile HMM database
  • hmmsearch - search profile HMMs against a sequence database
  • jackhmmer - iteratively search sequences against a protein database
  • nhmmer - search DNA/RNA queries against a DNA/RNA sequence database
  • nhmmscan - search nucleotide sequences against a nucleotide profile
  • phmmer - search protein sequences against a protein database

Other functions

  • hmmalign - align sequences to a profile HMM
  • hmmemit - produce sample sequences from a profile HMM
  • hmmlogo - produce data for an HMM logo from an HMM file

The package contains numerous other specialised functions.

The HMMER web server

In addition to the software package, the HMMER search function is available in the form of a web server.[9] The service facilitates searches across a range of databases, including sequence databases such as UniProt, SwissProt, and the Protein Data Bank, and HMM databases such as Pfam, TIGRFAMs and SUPERFAMILY. The four search types phmmer, hmmsearch, hmmscan and jackhmmer are supported (see Programs). The search function accepts single sequences as well as sequence alignments or profile HMMs.

The search results are accompanied by a report on the taxonomic breakdown, and the domain organisation of the hits. Search results can then be filtered according to either parameter.

The web service is currently run out of the European Bioinformatics Institute (EBI) in the United Kingdom, while development of the algorithm is still performed by Sean Eddy's team in the United States.[9] Major reasons for relocating the web service were to leverage the computing infrastructure at the EBI, and to cross-link HMMER searches with relevant databases that are also maintained by the EBI.

The HMMER3 release

The latest stable release of HMMER is version 3.0. HMMER3 is complete rewrite of the earlier HMMER2 package, with the aim of improving the speed of profile-HMM searches. Major changes are outlined below:

Improvements in speed

A major aim of the HMMER3 project, started in 2004 was to improve the speed of HMMER searches. While profile HMM-based homology searches were more accurate than BLAST-based approaches, their slower speed limited their applicability.[8] The main performance gain is due to a heuristic filter that finds high-scoring un-gapped matches within database sequences to a query profile. This heuristic results in a computation time comparable to BLAST with little impact on accuracy. Further gains in performance are due to a log-likelihood model that requires no calibration for estimating E-values, and allows the more accurate forward scores to be used for computing the significance of a homologous sequence.[10][6]

HMMER still lags behind BLAST in speed of DNA-based searches; however, DNA-based searches can be tuned such that an improvement in speed comes at the expense of accuracy.[11]

Improvements in remote homology searching

The major advance in speed was made possible by the development of an approach for calculating the significance of results integrated over a range of possible alignments.[10] In discovering remote homologs, alignments between query and hit proteins are often very uncertain. While most sequence alignment tools calculate match scores using only the best scoring alignment, HMMER3 calculates match scores by integrating across all possible alignments, to account for uncertainty in which alignment is best. HMMER sequence alignments are accompanied by posterior probability annotations, indicating which portions of the alignment have been assigned high confidence and which are more uncertain.

DNA sequence comparison

A major improvement in HMMER3 was the inclusion of DNA/DNA comparison tools. HMMER2 only had functionality to compare protein sequences.

Restriction to local alignments

While HMMER2 could perform local alignment (align a complete model to a subsequence of the target) and global alignment (align a complete model to a complete target sequence), HMMER3 only performs local alignment. This restriction is due to the difficulty in calculating the significance of hits when performing local/global alignments using the new algorithm.

See also

Several implementations of profile HMM methods and related position-specific scoring matrix methods are available. Some are listed below:

References

  1. ^ "Release 3.3.2". 27 November 2020. Retrieved 11 December 2020.
  2. ^ Durbin, Richard; Sean R. Eddy; Anders Krogh; Graeme Mitchison (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. ISBN 0-521-62971-3.
  3. ^ Krogh A, Brown M, Mian IS, Sjölander K, Haussler D (February 1994). "Hidden Markov models in computational biology. Applications to protein modeling". J. Mol. Biol. 235 (5): 1501–31. doi:10.1006/jmbi.1994.1104. PMID 8107089.
  4. ^ Farrar M (January 2007). "Striped Smith-Waterman speeds database searches six times over other SIMD implementations". Bioinformatics. 23 (2): 156–61. doi:10.1093/bioinformatics/btl582. PMID 17110365.
  5. ^ a b c Eddy, SR (1998). "Profile hidden Markov models". Bioinformatics. 14 (9): 755–63. doi:10.1093/bioinformatics/14.9.755. PMID 9918945.
  6. ^ a b Eddy, Sean R.; Pearson, William R. (20 October 2011). "Accelerated Profile HMM Searches". PLOS Computational Biology. 7 (10): e1002195. Bibcode:2011PLSCB...7E2195E. CiteSeerX 10.1.1.290.1476. doi:10.1371/journal.pcbi.1002195. PMC 3197634. PMID 22039361.
  7. ^ Eddy, Sean. "HMMER2 User's Guide" (PDF).
  8. ^ a b Sean R. Eddy; Travis J. Wheeler. "HMMER User's Guide" (PDF). and the HMMER development team. Retrieved 23 July 2017.
  9. ^ a b Finn, Robert D.; Clements, Jody; Arndt, William; Miller, Benjamin L.; Wheeler, Travis J.; Schreiber, Fabian; Bateman, Alex; Eddy, Sean R. (1 July 2015). "HMMER web server: 2015 update". Nucleic Acids Research. 43 (W1): W30–W38. doi:10.1093/nar/gkv397. PMC 4489315. PMID 25943547.
  10. ^ a b Eddy SR (2008). Rost, Burkhard (ed.). "A probabilistic model of local sequence alignment that simplifies statistical significance estimation". PLOS Comput Biol. 4 (5): e1000069. Bibcode:2008PLSCB...4E0069E. doi:10.1371/journal.pcbi.1000069. PMC 2396288. PMID 18516236.
  11. ^ Sean R. Eddy; Travis J. Wheeler. "HMMER3.1b2 Release Notes". and the HMMER development team. Retrieved 23 July 2017.

External links

  • Official website
  • HMMER3 announcement
  • A blog posting on HMMER policy on trademark, copyright, patents, and licensing

hmmer, free, commonly, used, software, package, sequence, analysis, written, sean, eddy, general, usage, identify, homologous, protein, nucleotide, sequences, perform, sequence, alignments, detects, homology, comparing, profile, hidden, markov, model, construc. HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy 2 Its general usage is to identify homologous protein or nucleotide sequences and to perform sequence alignments It detects homology by comparing a profile HMM a Hidden Markov model constructed explicitly for a particular search to either a single sequence or a database of sequences Sequences that score significantly better to the profile HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile HMM Profile HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program The profile HMM implementation used in the HMMER software was based on the work of Krogh and colleagues 3 HMMER is a console utility ported to every major operating system including different versions of Linux Windows and Mac OS HMMERDeveloper s Sean Eddy Travis Wheeler HMMER development teamStable release3 3 2 1 27 November 2020 2 years ago 27 November 2020 Repositorygithub wbr com wbr EddyRivasLab wbr hmmerWritten inCAvailable inEnglishTypeBioinformatics toolLicenseBSD 3Websitehmmer wbr orgA profile HMM modelling a multiple sequence alignment HMMER is the core utility that protein family databases such as Pfam and InterPro are based upon Some other bioinformatics tools such as UGENE also use HMMER HMMER3 also makes extensive use of vector instructions for increasing computational speed This work is based upon earlier publication showing a significant acceleration of the Smith Waterman algorithm for aligning two sequences 4 Contents 1 Profile HMMs 2 Programs in the HMMER package 2 1 Profile HMM building 2 2 Homology searching 2 3 Other functions 3 The HMMER web server 4 The HMMER3 release 4 1 Improvements in speed 4 2 Improvements in remote homology searching 4 3 DNA sequence comparison 4 4 Restriction to local alignments 5 See also 6 References 7 External linksProfile HMMs EditA profile HMM is a variant of an HMM relating specifically to biological sequences Profile HMMs turn a multiple sequence alignment into a position specific scoring system which can be used to align sequences and search databases for remotely homologous sequences 5 They capitalise on the fact that certain positions in a sequence alignment tend to have biases in which residues are most likely to occur and are likely to differ in their probability of containing an insertion or a deletion Capturing this information gives them a better ability to detect true homologs than traditional BLAST based approaches which penalise substitutions insertions and deletions equally regardless of where in an alignment they occur 6 The core profile HMM architecture used by HMMER Profile HMMs center around a linear set of match M states with one state corresponding to each consensus column in a sequence alignment Each M state emits a single residue amino acid or nucleotide The probability of emitting a particular residue is determined largely by the frequency at which that residue has been observed in that column of the alignment but also incorporates prior information on patterns of residues that tend to co occur in the same columns of sequence alignments This string of match states emitting amino acids at particular frequencies are analogous to position specific score matrices or weight matrices 5 A profile HMM takes this modelling of sequence alignments further by modelling insertions and deletions using I and D states respectively D states do not emit a residue while I states do emit a residue Multiple I states can occur consecutively corresponding to multiple residues between consensus columns in an alignment M I and D states are connected by state transition probabilities which also vary by position in the sequence alignment to reflect the different frequencies of insertions and deletions across sequence alignments 5 The HMMER2 and HMMER3 releases used an architecture for building profile HMMs called the Plan 7 architecture named after the seven states captured by the model In addition to the three major states M I and D six additional states capture non homologous flanking sequence in the alignment These 6 states collectively are important for controlling how sequences are aligned to the model e g whether a sequence can have multiple consecutive hits to the same model in the case of sequences with multiple instances of the same domain 7 Programs in the HMMER package EditThe HMMER package consists of a collection of programs for performing functions using profile hidden Markov models 8 The programs include Profile HMM building Edit hmmbuild construct profile HMMs from multiple sequence alignmentsHomology searching Edit hmmscan search protein sequences against a profile HMM database hmmsearch search profile HMMs against a sequence database jackhmmer iteratively search sequences against a protein database nhmmer search DNA RNA queries against a DNA RNA sequence database nhmmscan search nucleotide sequences against a nucleotide profile phmmer search protein sequences against a protein databaseOther functions Edit hmmalign align sequences to a profile HMM hmmemit produce sample sequences from a profile HMM hmmlogo produce data for an HMM logo from an HMM fileThe package contains numerous other specialised functions The HMMER web server EditIn addition to the software package the HMMER search function is available in the form of a web server 9 The service facilitates searches across a range of databases including sequence databases such as UniProt SwissProt and the Protein Data Bank and HMM databases such as Pfam TIGRFAMs and SUPERFAMILY The four search types phmmer hmmsearch hmmscan and jackhmmer are supported see Programs The search function accepts single sequences as well as sequence alignments or profile HMMs The search results are accompanied by a report on the taxonomic breakdown and the domain organisation of the hits Search results can then be filtered according to either parameter The web service is currently run out of the European Bioinformatics Institute EBI in the United Kingdom while development of the algorithm is still performed by Sean Eddy s team in the United States 9 Major reasons for relocating the web service were to leverage the computing infrastructure at the EBI and to cross link HMMER searches with relevant databases that are also maintained by the EBI The HMMER3 release EditThe latest stable release of HMMER is version 3 0 HMMER3 is complete rewrite of the earlier HMMER2 package with the aim of improving the speed of profile HMM searches Major changes are outlined below Improvements in speed Edit A major aim of the HMMER3 project started in 2004 was to improve the speed of HMMER searches While profile HMM based homology searches were more accurate than BLAST based approaches their slower speed limited their applicability 8 The main performance gain is due to a heuristic filter that finds high scoring un gapped matches within database sequences to a query profile This heuristic results in a computation time comparable to BLAST with little impact on accuracy Further gains in performance are due to a log likelihood model that requires no calibration for estimating E values and allows the more accurate forward scores to be used for computing the significance of a homologous sequence 10 6 HMMER still lags behind BLAST in speed of DNA based searches however DNA based searches can be tuned such that an improvement in speed comes at the expense of accuracy 11 Improvements in remote homology searching Edit The major advance in speed was made possible by the development of an approach for calculating the significance of results integrated over a range of possible alignments 10 In discovering remote homologs alignments between query and hit proteins are often very uncertain While most sequence alignment tools calculate match scores using only the best scoring alignment HMMER3 calculates match scores by integrating across all possible alignments to account for uncertainty in which alignment is best HMMER sequence alignments are accompanied by posterior probability annotations indicating which portions of the alignment have been assigned high confidence and which are more uncertain DNA sequence comparison Edit A major improvement in HMMER3 was the inclusion of DNA DNA comparison tools HMMER2 only had functionality to compare protein sequences Restriction to local alignments Edit While HMMER2 could perform local alignment align a complete model to a subsequence of the target and global alignment align a complete model to a complete target sequence HMMER3 only performs local alignment This restriction is due to the difficulty in calculating the significance of hits when performing local global alignments using the new algorithm See also EditHidden Markov model Sequence alignment software Pfam UGENESeveral implementations of profile HMM methods and related position specific scoring matrix methods are available Some are listed below HH suite SAM PSI BLAST MMseqs2 PFTOOLS GENEWISE PROBE permanent dead link META MEME BLOCKS GPU HMMER DeCypherHMMReferences Edit Release 3 3 2 27 November 2020 Retrieved 11 December 2020 Durbin Richard Sean R Eddy Anders Krogh Graeme Mitchison 1998 Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids Cambridge University Press ISBN 0 521 62971 3 Krogh A Brown M Mian IS Sjolander K Haussler D February 1994 Hidden Markov models in computational biology Applications to protein modeling J Mol Biol 235 5 1501 31 doi 10 1006 jmbi 1994 1104 PMID 8107089 Farrar M January 2007 Striped Smith Waterman speeds database searches six times over other SIMD implementations Bioinformatics 23 2 156 61 doi 10 1093 bioinformatics btl582 PMID 17110365 a b c Eddy SR 1998 Profile hidden Markov models Bioinformatics 14 9 755 63 doi 10 1093 bioinformatics 14 9 755 PMID 9918945 a b Eddy Sean R Pearson William R 20 October 2011 Accelerated Profile HMM Searches PLOS Computational Biology 7 10 e1002195 Bibcode 2011PLSCB 7E2195E CiteSeerX 10 1 1 290 1476 doi 10 1371 journal pcbi 1002195 PMC 3197634 PMID 22039361 Eddy Sean HMMER2 User s Guide PDF a b Sean R Eddy Travis J Wheeler HMMER User s Guide PDF and the HMMER development team Retrieved 23 July 2017 a b Finn Robert D Clements Jody Arndt William Miller Benjamin L Wheeler Travis J Schreiber Fabian Bateman Alex Eddy Sean R 1 July 2015 HMMER web server 2015 update Nucleic Acids Research 43 W1 W30 W38 doi 10 1093 nar gkv397 PMC 4489315 PMID 25943547 a b Eddy SR 2008 Rost Burkhard ed A probabilistic model of local sequence alignment that simplifies statistical significance estimation PLOS Comput Biol 4 5 e1000069 Bibcode 2008PLSCB 4E0069E doi 10 1371 journal pcbi 1000069 PMC 2396288 PMID 18516236 Sean R Eddy Travis J Wheeler HMMER3 1b2 Release Notes and the HMMER development team Retrieved 23 July 2017 External links EditOfficial website HMMER3 announcement A blog posting on HMMER policy on trademark copyright patents and licensing Retrieved from https en wikipedia org w index php title HMMER amp oldid 1090926305, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.