fbpx
Wikipedia

Rfam

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm,[1][2][3][4] and currently hosted at the European Bioinformatics Institute.[5] Rfam is designed to be similar to the Pfam database for annotating protein families.

Rfam
Content
DescriptionThe Rfam database provides alignments, consensus secondary structures and covariance models for RNA families.
Data types
captured
RNA families
Organismsall
Contact
Research centerEBI
Primary citationPMID 33211869
Access
Data formatStockholm format
Websiterfam.org
Download URLFTP
Miscellaneous
LicensePublic domain
Bookmarkable
entities
yes

Unlike proteins, ncRNAs often have similar secondary structure without sharing much similarity in the primary sequence. Rfam divides ncRNAs into families based on evolution from a common ancestor. Producing multiple sequence alignments (MSA) of these families can provide insight into their structure and function, similar to the case of protein families. These MSAs become more useful with the addition of secondary structure information. Rfam researchers also contribute to Wikipedia's RNA WikiProject.[4][6]

Uses

The Rfam database can be used for a variety of functions. For each ncRNA family, the interface allows users to: view and download multiple sequence alignments; read annotation; and examine species distribution of family members. There are also links provided to literature references and other RNA databases. Rfam also provides links to Wikipedia so that entries can be created or edited by users.

The interface at the Rfam website allows users to search ncRNAs by keyword, family name, or genome as well as to search by ncRNA sequence or EMBL accession number.[7] The database information is also available for download, installation and use using the INFERNAL software package.[8][9][10] The INFERNAL package can also be used with Rfam to annotate sequences (including complete genomes) for homologues to known ncRNAs.

Methods

 
A theoretical ncRNA alignment from 6 species. Secondary structure base pairs are coloured in blocks and identified in the secondary structure consensus sequence (bottom line) by the < and > symbols.

In the database, the information of the secondary structure and the primary sequence, represented by the MSA, is combined in statistical models called profile stochastic context-free grammars (SCFGs), also known as covariance models. These are analogous to hidden Markov models used for protein family annotation in the Pfam database.[1] Each family in the database is represented by two multiple sequence alignments in Stockholm format and a SCFG.

The first MSA is the "seed" alignment. It is a hand-curated alignment that contains representative members of the ncRNA family and is annotated with structural information. This seed alignment is used to create the SCFG, which is used with the Rfam software INFERNAL to identify additional family members and add them to the alignment. A family-specific threshold value is chosen to avoid false positives.

Until release 12, Rfam used an initial BLAST filtering step because profile SCFGs were too computationally expensive. However, the latest versions of INFERNAL are fast enough[11] so that the BLAST step is no longer necessary.[12]

The second MSA is the “full” alignment, and is created as a result of a search using the covariance model against the sequence database. All detected homologs are aligned to the model, giving the automatically produced full alignment.

History

Version 1.0 of Rfam was launched in 2003 and contained 25 ncRNA families and annotated about 50 000 ncRNA genes. In 2005, version 6.1 was released and contained 379 families annotating over 280 000 genes. In August 2012, version 11.0 contained 2208 RNA families, while the current version (14.6, released in July 2021) annotates 4070[13] families.

Major releases and publications

  • 2003 - Rfam: an RNA family database. [14]
  • 2005 - Rfam: annotating non-coding RNAs in complete genomes. [15]
  • 2008 - The RNA WikiProject: community annotation of RNA families. [16]
  • 2008 - Rfam: updates to the RNA families database. [17]
  • 2011 - Rfam: Wikipedia, clans and the “decimal” release. [18]
  • 2012 - Rfam 11.0: 10 years of RNA families. [19]
  • 2014 - Rfam 12.0: updates to the RNA families database. [17]
  • 2017 - Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. [20]
  • 2020 - Rfam 14: expanded coverage of metagenomic, viral and microRNA families. [21]

Problems

  1. The genomes of higher eukaryotes contain many ncRNA-derived pseudogenes and repeats. Distinguishing these non-functional copies from functional ncRNA is a formidable challenge.[2]
  2. Introns are not modeled by covariance models.

References

  1. ^ a b Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR (2003). "Rfam: an RNA family database". Nucleic Acids Res. 31 (1): 439–41. doi:10.1093/nar/gkg006. PMC 165453. PMID 12520045.
  2. ^ a b Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005). "Rfam: annotating non-coding RNAs in complete genomes". Nucleic Acids Res. 33 (Database issue): D121–4. doi:10.1093/nar/gki081. PMC 540035. PMID 15608160.
  3. ^ Gardner PP, Daub J, Tate JG, et al. (October 2008). "Rfam: updates to the RNA families database". Nucleic Acids Research. 37 (Database issue): D136–D140. doi:10.1093/nar/gkn766. PMC 2686503. PMID 18953034.
  4. ^ a b Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, Bateman A (2011). "Rfam: Wikipedia, clans and the "decimal" release". Nucleic Acids Res. 39 (Database issue): D141–5. doi:10.1093/nar/gkq1129. PMC 3013711. PMID 21062808.
  5. ^ "Moving to xfam.org". Xfam Blog. Retrieved 3 May 2014.
  6. ^ Daub J, Gardner PP, Tate J, et al. (October 2008). "The RNA WikiProject: Community annotation of RNA families". RNA. 14 (12): 2462–4. doi:10.1261/rna.1200508. PMC 2590952. PMID 18945806.
  7. ^ http://rfam.xfam.org[bare URL]
  8. ^ Eddy SR, Durbin R (June 1994). "RNA sequence analysis using covariance models". Nucleic Acids Research. 22 (11): 2079–88. doi:10.1093/nar/22.11.2079. PMC 308124. PMID 8029015.
  9. ^ Eddy SR (2002). "A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure". BMC Bioinformatics. 3: 18. doi:10.1186/1471-2105-3-18. PMC 119854. PMID 12095421.
  10. ^ Nawrocki EP, Eddy SR (2013). "Infernal 1.1: 100-fold faster RNA homology searches". Bioinformatics. 29 (22): 2933–5. doi:10.1093/bioinformatics/btt509. PMC 3810854. PMID 24008419.
  11. ^ Nawrocki, Eric P.; Eddy, Sean R. (15 November 2013). "Infernal 1.1: 100-fold faster RNA homology searches". Bioinformatics. 29 (22): 2933–2935. doi:10.1093/bioinformatics/btt509. ISSN 1367-4811. PMC 3810854. PMID 24008419.
  12. ^ Nawrocki, Eric P.; Burge, Sarah W.; Bateman, Alex; Daub, Jennifer; Eberhardt, Ruth Y.; Eddy, Sean R.; Floden, Evan W.; Gardner, Paul P.; Jones, Thomas A. (January 2015). "Rfam 12.0: updates to the RNA families database". Nucleic Acids Research. 43 (Database issue): D130–137. doi:10.1093/nar/gku1063. ISSN 1362-4962. PMC 4383904. PMID 25392425.
  13. ^ https://rfam.xfam.org/[bare URL]
  14. ^ Griffiths-Jones, S. (1 January 2003). "Rfam: an RNA family database". Nucleic Acids Research. 31 (1): 439–441. doi:10.1093/nar/gkg006. ISSN 1362-4962.
  15. ^ Griffiths-Jones, S. (17 December 2004). "Rfam: annotating non-coding RNAs in complete genomes". Nucleic Acids Research. 33 (Database issue): D121–D124. doi:10.1093/nar/gki081. ISSN 1362-4962.
  16. ^ Daub, Jennifer; Gardner, Paul P.; Tate, John; Ramsköld, Daniel; Manske, Magnus; Scott, William G.; Weinberg, Zasha; Griffiths-Jones, Sam; Bateman, Alex (22 October 2008). "The RNA WikiProject: Community annotation of RNA families". RNA. 14 (12): 2462–2464. doi:10.1261/rna.1200508. ISSN 1355-8382.
  17. ^ a b Gardner, P. P.; Daub, J.; Tate, J. G.; Nawrocki, E. P.; Kolbe, D. L.; Lindgreen, S.; Wilkinson, A. C.; Finn, R. D.; Griffiths-Jones, S.; Eddy, S. R.; Bateman, A. (1 January 2009). "Rfam: updates to the RNA families database". Nucleic Acids Research. 37 (Database): D136–D140. doi:10.1093/nar/gkn766. ISSN 0305-1048.
  18. ^ Gardner, P. P.; Daub, J.; Tate, J.; Moore, B. L.; Osuch, I. H.; Griffiths-Jones, S.; Finn, R. D.; Nawrocki, E. P.; Kolbe, D. L.; Eddy, S. R.; Bateman, A. (9 November 2010). "Rfam: Wikipedia, clans and the "decimal" release". Nucleic Acids Research. 39 (Database): D141–D145. doi:10.1093/nar/gkq1129. ISSN 0305-1048.
  19. ^ Burge, Sarah W.; Daub, Jennifer; Eberhardt, Ruth; Tate, John; Barquist, Lars; Nawrocki, Eric P.; Eddy, Sean R.; Gardner, Paul P.; Bateman, Alex (2 November 2012). "Rfam 11.0: 10 years of RNA families". Nucleic Acids Research. 41 (D1): D226–D232. doi:10.1093/nar/gks1005. ISSN 0305-1048.
  20. ^ Bujnicki, Janusz; Ghosh, Pritha (4 March 2018). "Faculty Opinions recommendation of Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families". Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature. Retrieved 1 November 2022.
  21. ^ Kalvari, Ioanna; Nawrocki, Eric P; Ontiveros-Palacios, Nancy; Argasinska, Joanna; Lamkiewicz, Kevin; Marz, Manja; Griffiths-Jones, Sam; Toffano-Nioche, Claire; Gautheret, Daniel; Weinberg, Zasha; Rivas, Elena; Eddy, Sean R; Finn, Robert D; Bateman, Alex; Petrov, Anton I (19 November 2020). "Rfam 14: expanded coverage of metagenomic, viral and microRNA families". Nucleic Acids Research. 49 (D1): D192–D200. doi:10.1093/nar/gkaa1047. ISSN 0305-1048.

External links

  • Rfam website at the European Bioinformatics Institute
  • INFERNAL software package
  • miRBase

rfam, database, containing, information, about, coding, ncrna, families, other, structured, elements, annotated, open, access, database, originally, developed, wellcome, trust, sanger, institute, collaboration, with, janelia, farm, currently, hosted, european,. Rfam is a database containing information about non coding RNA ncRNA families and other structured RNA elements It is an annotated open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm 1 2 3 4 and currently hosted at the European Bioinformatics Institute 5 Rfam is designed to be similar to the Pfam database for annotating protein families RfamContentDescriptionThe Rfam database provides alignments consensus secondary structures and covariance models for RNA families Data typescapturedRNA familiesOrganismsallContactResearch centerEBIPrimary citationPMID 33211869AccessData formatStockholm formatWebsiterfam wbr orgDownload URLFTPMiscellaneousLicensePublic domainBookmarkableentitiesyesUnlike proteins ncRNAs often have similar secondary structure without sharing much similarity in the primary sequence Rfam divides ncRNAs into families based on evolution from a common ancestor Producing multiple sequence alignments MSA of these families can provide insight into their structure and function similar to the case of protein families These MSAs become more useful with the addition of secondary structure information Rfam researchers also contribute to Wikipedia s RNA WikiProject 4 6 Contents 1 Uses 2 Methods 3 History 3 1 Major releases and publications 4 Problems 5 References 6 External linksUses EditThe Rfam database can be used for a variety of functions For each ncRNA family the interface allows users to view and download multiple sequence alignments read annotation and examine species distribution of family members There are also links provided to literature references and other RNA databases Rfam also provides links to Wikipedia so that entries can be created or edited by users The interface at the Rfam website allows users to search ncRNAs by keyword family name or genome as well as to search by ncRNA sequence or EMBL accession number 7 The database information is also available for download installation and use using the INFERNAL software package 8 9 10 The INFERNAL package can also be used with Rfam to annotate sequences including complete genomes for homologues to known ncRNAs Methods Edit A theoretical ncRNA alignment from 6 species Secondary structure base pairs are coloured in blocks and identified in the secondary structure consensus sequence bottom line by the lt and gt symbols In the database the information of the secondary structure and the primary sequence represented by the MSA is combined in statistical models called profile stochastic context free grammars SCFGs also known as covariance models These are analogous to hidden Markov models used for protein family annotation in the Pfam database 1 Each family in the database is represented by two multiple sequence alignments in Stockholm format and a SCFG The first MSA is the seed alignment It is a hand curated alignment that contains representative members of the ncRNA family and is annotated with structural information This seed alignment is used to create the SCFG which is used with the Rfam software INFERNAL to identify additional family members and add them to the alignment A family specific threshold value is chosen to avoid false positives Until release 12 Rfam used an initial BLAST filtering step because profile SCFGs were too computationally expensive However the latest versions of INFERNAL are fast enough 11 so that the BLAST step is no longer necessary 12 The second MSA is the full alignment and is created as a result of a search using the covariance model against the sequence database All detected homologs are aligned to the model giving the automatically produced full alignment History EditVersion 1 0 of Rfam was launched in 2003 and contained 25 ncRNA families and annotated about 50 000 ncRNA genes In 2005 version 6 1 was released and contained 379 families annotating over 280 000 genes In August 2012 version 11 0 contained 2208 RNA families while the current version 14 6 released in July 2021 annotates 4070 13 families Major releases and publications Edit 2003 Rfam an RNA family database 14 2005 Rfam annotating non coding RNAs in complete genomes 15 2008 The RNA WikiProject community annotation of RNA families 16 2008 Rfam updates to the RNA families database 17 2011 Rfam Wikipedia clans and the decimal release 18 2012 Rfam 11 0 10 years of RNA families 19 2014 Rfam 12 0 updates to the RNA families database 17 2017 Rfam 13 0 shifting to a genome centric resource for non coding RNA families 20 2020 Rfam 14 expanded coverage of metagenomic viral and microRNA families 21 Problems EditThe genomes of higher eukaryotes contain many ncRNA derived pseudogenes and repeats Distinguishing these non functional copies from functional ncRNA is a formidable challenge 2 Introns are not modeled by covariance models References Edit a b Griffiths Jones S Bateman A Marshall M Khanna A Eddy SR 2003 Rfam an RNA family database Nucleic Acids Res 31 1 439 41 doi 10 1093 nar gkg006 PMC 165453 PMID 12520045 a b Griffiths Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A 2005 Rfam annotating non coding RNAs in complete genomes Nucleic Acids Res 33 Database issue D121 4 doi 10 1093 nar gki081 PMC 540035 PMID 15608160 Gardner PP Daub J Tate JG et al October 2008 Rfam updates to the RNA families database Nucleic Acids Research 37 Database issue D136 D140 doi 10 1093 nar gkn766 PMC 2686503 PMID 18953034 a b Gardner PP Daub J Tate J Moore BL Osuch IH Griffiths Jones S Finn RD Nawrocki EP Kolbe DL Eddy SR Bateman A 2011 Rfam Wikipedia clans and the decimal release Nucleic Acids Res 39 Database issue D141 5 doi 10 1093 nar gkq1129 PMC 3013711 PMID 21062808 Moving to xfam org Xfam Blog Retrieved 3 May 2014 Daub J Gardner PP Tate J et al October 2008 The RNA WikiProject Community annotation of RNA families RNA 14 12 2462 4 doi 10 1261 rna 1200508 PMC 2590952 PMID 18945806 http rfam xfam org bare URL Eddy SR Durbin R June 1994 RNA sequence analysis using covariance models Nucleic Acids Research 22 11 2079 88 doi 10 1093 nar 22 11 2079 PMC 308124 PMID 8029015 Eddy SR 2002 A memory efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure BMC Bioinformatics 3 18 doi 10 1186 1471 2105 3 18 PMC 119854 PMID 12095421 Nawrocki EP Eddy SR 2013 Infernal 1 1 100 fold faster RNA homology searches Bioinformatics 29 22 2933 5 doi 10 1093 bioinformatics btt509 PMC 3810854 PMID 24008419 Nawrocki Eric P Eddy Sean R 15 November 2013 Infernal 1 1 100 fold faster RNA homology searches Bioinformatics 29 22 2933 2935 doi 10 1093 bioinformatics btt509 ISSN 1367 4811 PMC 3810854 PMID 24008419 Nawrocki Eric P Burge Sarah W Bateman Alex Daub Jennifer Eberhardt Ruth Y Eddy Sean R Floden Evan W Gardner Paul P Jones Thomas A January 2015 Rfam 12 0 updates to the RNA families database Nucleic Acids Research 43 Database issue D130 137 doi 10 1093 nar gku1063 ISSN 1362 4962 PMC 4383904 PMID 25392425 https rfam xfam org bare URL Griffiths Jones S 1 January 2003 Rfam an RNA family database Nucleic Acids Research 31 1 439 441 doi 10 1093 nar gkg006 ISSN 1362 4962 Griffiths Jones S 17 December 2004 Rfam annotating non coding RNAs in complete genomes Nucleic Acids Research 33 Database issue D121 D124 doi 10 1093 nar gki081 ISSN 1362 4962 Daub Jennifer Gardner Paul P Tate John Ramskold Daniel Manske Magnus Scott William G Weinberg Zasha Griffiths Jones Sam Bateman Alex 22 October 2008 The RNA WikiProject Community annotation of RNA families RNA 14 12 2462 2464 doi 10 1261 rna 1200508 ISSN 1355 8382 a b Gardner P P Daub J Tate J G Nawrocki E P Kolbe D L Lindgreen S Wilkinson A C Finn R D Griffiths Jones S Eddy S R Bateman A 1 January 2009 Rfam updates to the RNA families database Nucleic Acids Research 37 Database D136 D140 doi 10 1093 nar gkn766 ISSN 0305 1048 Gardner P P Daub J Tate J Moore B L Osuch I H Griffiths Jones S Finn R D Nawrocki E P Kolbe D L Eddy S R Bateman A 9 November 2010 Rfam Wikipedia clans and the decimal release Nucleic Acids Research 39 Database D141 D145 doi 10 1093 nar gkq1129 ISSN 0305 1048 Burge Sarah W Daub Jennifer Eberhardt Ruth Tate John Barquist Lars Nawrocki Eric P Eddy Sean R Gardner Paul P Bateman Alex 2 November 2012 Rfam 11 0 10 years of RNA families Nucleic Acids Research 41 D1 D226 D232 doi 10 1093 nar gks1005 ISSN 0305 1048 Bujnicki Janusz Ghosh Pritha 4 March 2018 Faculty Opinions recommendation of Rfam 13 0 shifting to a genome centric resource for non coding RNA families Faculty Opinions Post Publication Peer Review of the Biomedical Literature Retrieved 1 November 2022 Kalvari Ioanna Nawrocki Eric P Ontiveros Palacios Nancy Argasinska Joanna Lamkiewicz Kevin Marz Manja Griffiths Jones Sam Toffano Nioche Claire Gautheret Daniel Weinberg Zasha Rivas Elena Eddy Sean R Finn Robert D Bateman Alex Petrov Anton I 19 November 2020 Rfam 14 expanded coverage of metagenomic viral and microRNA families Nucleic Acids Research 49 D1 D192 D200 doi 10 1093 nar gkaa1047 ISSN 0305 1048 External links EditRfam website at the European Bioinformatics Institute INFERNAL software package miRBase Retrieved from https en wikipedia org w index php title Rfam amp oldid 1130755900, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.