fbpx
Wikipedia

Phylogenetic Assignment of Named Global Outbreak Lineages

The Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) is a software tool developed by Dr. Áine O'Toole[2] and members of the Andrew Rambaut laboratory, with an associated web application developed by the Centre for Genomic Pathogen Surveillance in South Cambridgeshire.[3] Its purpose is to implement a dynamic nomenclature (known as the PANGO nomenclature) to classify genetic lineages for SARS-CoV-2, the virus that causes COVID-19.[4] A user with a full genome sequence of a sample of SARS-CoV-2 can use the tool to submit that sequence, which is then compared with other genome sequences, and assigned the most likely lineage (PANGO lineage).[5] Single or multiple runs are possible, and the tool can return further information regarding the known history of the assigned lineage.[5] Additionally, it interfaces with Microreact, to show a time sequence of the location of reports of sequenced samples of the same lineage.[5] This latter feature draws on publicly available genomes obtained from the COVID-19 Genomics UK Consortium and from those submitted to GISAID.[5] It is named after the pangolin.

Phylogenetic Assignment of Named Global Outbreak Lineages
PANGOLIN logo
Initial release30 April 2020; 3 years ago (2020-04-30)
Stable release
4.3.1[1]  / 26 July 2023; 4 months ago (26 July 2023)
Repositorygithub.com/cov-lineages/pangolin
Written inPython
LicenseGNU General Public License v3.0
Websitepangolin.cog-uk.io 

Context edit

PANGOLIN is a key component underpinning the PANGO nomenclature system.[6]

As described in Andrew Rambaut et al. (2020),[4] a PANGO Lineage is described as a cluster of sequences that are associated with an epidemiological event, for instance an introduction of the virus into a distinct geographic area with evidence of onward spread. Lineages are designed to capture the emerging edge of the pandemic and are at a fine-grain resolution suitable to genomic epidemiological surveillance and outbreak investigation.[citation needed]

Both the tool and the PANGOLIN nomenclature system have been used extensively during the COVID-19 pandemic.[4][7][8]

Description edit

Lineage designation edit

Distinct from the PANGOLIN tool, Pango lineages are regularly, manually curated based on the current globally circulating diversity. A large phylogenetic tree is constructed from an alignment containing publicly available SARS-CoV-2 genomes, and sub-clusters of sequences in this tree are manually examined and cross-referenced against epidemiological information to designate new lineages; these can be designated by data producers, and lineage suggestions can be submitted to the Pango team via a GitHub issue request.[9][10][further explanation needed]

Model training edit

These manually curated lineage designations, and the associated genome sequences, are the input into the machine learning model training. This model, both the training and the assignment, has been termed 'pangoLEARN'. The current version of pangoLEARN uses a classification tree, based on the scikit-learn implementation[11] of a decision tree classifier.

Lineage assignation edit

Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes.[12] This approach is fast and can assign large numbers of SARS-CoV-2 genomes in a relatively short time.[13]

Availability edit

PANGOLIN is available as a command-line-based tool, downloadable from Conda and from a GitHub repository,[12] and as a web-application[14] with a drag-and-drop graphical user interface. The PANGOLIN web application has assigned more than 512,000 unique SARS-CoV-2 sequences as of January 2021.[citation needed]

Creators and developers edit

PANGOLIN was created by Áine O'Toole and the Rambaut lab and released on 5 April 2020. The main developers of PANGOLIN are Áine O'Toole and Emily Scher; many others have contributed to various aspects of the tool, including Ben Jackson, J.T. McCrone, Verity Hill, and Rachel Colquhoun of the Rambaut Lab.[5]

The PANGOLIN web application was developed by the Centre for Genomic Pathogen Surveillance,[14] namely Anthony Underwood, Ben Taylor, Corin Yeats, Khali Abu-Dahab, and David Aanensen.[5]

See also edit

References edit

  1. ^ "Release 4.3.1". 26 July 2023. Retrieved 1 August 2023.
  2. ^ O’Toole, Áine; Scher, Emily; Underwood, Anthony; Jackson, Ben; Hill, Verity; McCrone, John T; Colquhoun, Rachel; Ruis, Chris; Abu-Dahab, Khalil; Taylor, Ben; Yeats, Corin; Du Plessis, Louis; Maloney, Daniel; Medd, Nathan; Attwood, Stephen W; Aanensen, David M; Holmes, Edward C; Pybus, Oliver G; Rambaut, Andrew (5 July 2021). "Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool". Virus Evolution. 7 (2): veab064. doi:10.1093/ve/veab064. PMC 8344591. PMID 34527285.
  3. ^ "Real-Time Epidemiology for COVID-19". www.pathogensurveillance.net. from the original on 17 January 2021. Retrieved 22 January 2021.
  4. ^ a b c Rambaut, A.; Holmes, E.C.; O’Toole, Á.; et al. (2020). "A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology". Nature Microbiology. 5 (11): 1403–1407. doi:10.1038/s41564-020-0770-5. PMC 7610519. PMID 32669681. S2CID 220544096.
  5. ^ a b c d e f "Pangolin web application release". virological.org. May 2020. from the original on 10 February 2021. Retrieved 18 February 2021.
  6. ^ Rambaut, Andrew; Holmes, Edward C.; o'Toole, Áine; Hill, Verity; McCrone, John T.; Ruis, Christopher; Du Plessis, Louis; Pybus, Oliver G. (15 July 2020). "Addendum: A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology". Nature Microbiology. 6 (3): 415. doi:10.1038/s41564-021-00872-5. PMC 7845574. PMID 33514928.
  7. ^ Pipes, Lenore; Wang, Hongru; Huelsenbeck, John P; Nielsen, Rasmus (9 December 2020). Malik, Harmit (ed.). "Assessing Uncertainty in the Rooting of the SARS-CoV-2 Phylogeny". Molecular Biology and Evolution. Oxford University Press (OUP). 38 (4): 1537–1543. doi:10.1093/molbev/msaa316. ISSN 0737-4038. PMC 7798932. PMID 33295605. from the original on 10 December 2020. Retrieved 22 January 2021.
  8. ^ Jacob, Jobin John; Vasudevan, Karthick; Pragasam, Agila Kumari; Gunasekaran, Karthik; Kang, Gagandeep; Veeraraghavan, Balaji; Mutreja, Ankur (22 December 2020). "Evolutionary tracking of SARS-CoV-2 genetic variants highlights intricate balance of stabilizing and destabilizing mutations". bioRxiv 10.1101/2020.12.22.423920. Phylogenetic Assignment of Named Global Outbreak LINeages tool (PANGOLIN) has been the most widely used tool for lineage assignment to newly emerging variants.
  9. ^ "pangoLEARN Store of the trained model for PANGOLIN to access". GitHub: cov-lineages/pangoLEARN. from the original on 3 January 2021. Retrieved 13 February 2021.
  10. ^ "PANGO lineages". cov-lineages.org. from the original on 28 February 2021. Retrieved 4 March 2021.
  11. ^ "sklearn.tree.DecisionTreeClassifier". scikit-learn.org. from the original on 19 February 2021. Retrieved 13 February 2021.
  12. ^ a b "cov-lineages/pangolin". GitHub: cov-lineages/pangolin. from the original on 15 February 2021. Retrieved 13 February 2021.
  13. ^ "pangoLEARN PANGOLIN 2.0: pangoLEARN description". cov-lineages.org. from the original on 4 November 2021. Retrieved 19 November 2021. The model was trained using ~60,000 SARS-CoV-2 sequences from GISAID... training this model takes approximately 30 minutes on our hardware
  14. ^ a b "Pangolin COVID-19 Lineage Assigner". pangolin.cog-uk.io. from the original on 10 February 2021. Retrieved 13 February 2021.

External links edit

  • pangolin on GitHub
  • Official website  

phylogenetic, assignment, named, global, outbreak, lineages, pangolin, software, tool, developed, Áine, toole, members, andrew, rambaut, laboratory, with, associated, application, developed, centre, genomic, pathogen, surveillance, south, cambridgeshire, purpo. The Phylogenetic Assignment of Named Global Outbreak Lineages PANGOLIN is a software tool developed by Dr Aine O Toole 2 and members of the Andrew Rambaut laboratory with an associated web application developed by the Centre for Genomic Pathogen Surveillance in South Cambridgeshire 3 Its purpose is to implement a dynamic nomenclature known as the PANGO nomenclature to classify genetic lineages for SARS CoV 2 the virus that causes COVID 19 4 A user with a full genome sequence of a sample of SARS CoV 2 can use the tool to submit that sequence which is then compared with other genome sequences and assigned the most likely lineage PANGO lineage 5 Single or multiple runs are possible and the tool can return further information regarding the known history of the assigned lineage 5 Additionally it interfaces with Microreact to show a time sequence of the location of reports of sequenced samples of the same lineage 5 This latter feature draws on publicly available genomes obtained from the COVID 19 Genomics UK Consortium and from those submitted to GISAID 5 It is named after the pangolin Phylogenetic Assignment of Named Global Outbreak LineagesPANGOLIN logoInitial release30 April 2020 3 years ago 2020 04 30 Stable release4 3 1 1 26 July 2023 4 months ago 26 July 2023 Repositorygithub wbr com wbr cov lineages wbr pangolinWritten inPythonLicenseGNU General Public License v3 0Websitepangolin wbr cog uk wbr io Contents 1 Context 2 Description 2 1 Lineage designation 2 2 Model training 2 3 Lineage assignation 3 Availability 4 Creators and developers 5 See also 6 References 7 External linksContext editPANGOLIN is a key component underpinning the PANGO nomenclature system 6 As described in Andrew Rambaut et al 2020 4 a PANGO Lineage is described as a cluster of sequences that are associated with an epidemiological event for instance an introduction of the virus into a distinct geographic area with evidence of onward spread Lineages are designed to capture the emerging edge of the pandemic and are at a fine grain resolution suitable to genomic epidemiological surveillance and outbreak investigation citation needed Both the tool and the PANGOLIN nomenclature system have been used extensively during the COVID 19 pandemic 4 7 8 Description editLineage designation edit Distinct from the PANGOLIN tool Pango lineages are regularly manually curated based on the current globally circulating diversity A large phylogenetic tree is constructed from an alignment containing publicly available SARS CoV 2 genomes and sub clusters of sequences in this tree are manually examined and cross referenced against epidemiological information to designate new lineages these can be designated by data producers and lineage suggestions can be submitted to the Pango team via a GitHub issue request 9 10 further explanation needed Model training edit These manually curated lineage designations and the associated genome sequences are the input into the machine learning model training This model both the training and the assignment has been termed pangoLEARN The current version of pangoLEARN uses a classification tree based on the scikit learn implementation 11 of a decision tree classifier Lineage assignation edit Originally PANGOLIN used a maximum likelihood based assignment algorithm to assign query SARS CoV 2 the most likely lineage sequence Since the release of Version 2 0 in July 2020 however it has used the pangoLEARN machine learning based assignment algorithm to assign lineages to new SARS CoV 2 genomes 12 This approach is fast and can assign large numbers of SARS CoV 2 genomes in a relatively short time 13 Availability editPANGOLIN is available as a command line based tool downloadable from Conda and from a GitHub repository 12 and as a web application 14 with a drag and drop graphical user interface The PANGOLIN web application has assigned more than 512 000 unique SARS CoV 2 sequences as of January 2021 citation needed Creators and developers editPANGOLIN was created by Aine O Toole and the Rambaut lab and released on 5 April 2020 The main developers of PANGOLIN are Aine O Toole and Emily Scher many others have contributed to various aspects of the tool including Ben Jackson J T McCrone Verity Hill and Rachel Colquhoun of the Rambaut Lab 5 The PANGOLIN web application was developed by the Centre for Genomic Pathogen Surveillance 14 namely Anthony Underwood Ben Taylor Corin Yeats Khali Abu Dahab and David Aanensen 5 See also editColloquial names of COVID 19 variants Variants of SARS CoV 2 Nextstrain INSDCReferences edit Release 4 3 1 26 July 2023 Retrieved 1 August 2023 O Toole Aine Scher Emily Underwood Anthony Jackson Ben Hill Verity McCrone John T Colquhoun Rachel Ruis Chris Abu Dahab Khalil Taylor Ben Yeats Corin Du Plessis Louis Maloney Daniel Medd Nathan Attwood Stephen W Aanensen David M Holmes Edward C Pybus Oliver G Rambaut Andrew 5 July 2021 Assignment of Epidemiological Lineages in an Emerging Pandemic Using the Pangolin Tool Virus Evolution 7 2 veab064 doi 10 1093 ve veab064 PMC 8344591 PMID 34527285 Real Time Epidemiology for COVID 19 www pathogensurveillance net Archived from the original on 17 January 2021 Retrieved 22 January 2021 a b c Rambaut A Holmes E C O Toole A et al 2020 A dynamic nomenclature proposal for SARS CoV 2 lineages to assist genomic epidemiology Nature Microbiology 5 11 1403 1407 doi 10 1038 s41564 020 0770 5 PMC 7610519 PMID 32669681 S2CID 220544096 a b c d e f Pangolin web application release virological org May 2020 Archived from the original on 10 February 2021 Retrieved 18 February 2021 Rambaut Andrew Holmes Edward C o Toole Aine Hill Verity McCrone John T Ruis Christopher Du Plessis Louis Pybus Oliver G 15 July 2020 Addendum A dynamic nomenclature proposal for SARS CoV 2 lineages to assist genomic epidemiology Nature Microbiology 6 3 415 doi 10 1038 s41564 021 00872 5 PMC 7845574 PMID 33514928 Pipes Lenore Wang Hongru Huelsenbeck John P Nielsen Rasmus 9 December 2020 Malik Harmit ed Assessing Uncertainty in the Rooting of the SARS CoV 2 Phylogeny Molecular Biology and Evolution Oxford University Press OUP 38 4 1537 1543 doi 10 1093 molbev msaa316 ISSN 0737 4038 PMC 7798932 PMID 33295605 Archived from the original on 10 December 2020 Retrieved 22 January 2021 Jacob Jobin John Vasudevan Karthick Pragasam Agila Kumari Gunasekaran Karthik Kang Gagandeep Veeraraghavan Balaji Mutreja Ankur 22 December 2020 Evolutionary tracking of SARS CoV 2 genetic variants highlights intricate balance of stabilizing and destabilizing mutations bioRxiv 10 1101 2020 12 22 423920 Phylogenetic Assignment of Named Global Outbreak LINeages tool PANGOLIN has been the most widely used tool for lineage assignment to newly emerging variants pangoLEARN Store of the trained model for PANGOLIN to access GitHub cov lineages pangoLEARN Archived from the original on 3 January 2021 Retrieved 13 February 2021 PANGO lineages cov lineages org Archived from the original on 28 February 2021 Retrieved 4 March 2021 sklearn tree DecisionTreeClassifier scikit learn org Archived from the original on 19 February 2021 Retrieved 13 February 2021 a b cov lineages pangolin GitHub cov lineages pangolin Archived from the original on 15 February 2021 Retrieved 13 February 2021 pangoLEARN PANGOLIN 2 0 pangoLEARN description cov lineages org Archived from the original on 4 November 2021 Retrieved 19 November 2021 The model was trained using 60 000 SARS CoV 2 sequences from GISAID training this model takes approximately 30 minutes on our hardware a b Pangolin COVID 19 Lineage Assigner pangolin cog uk io Archived from the original on 10 February 2021 Retrieved 13 February 2021 External links editpangolin on GitHub Official website nbsp Retrieved from https en wikipedia org w index php title Phylogenetic Assignment of Named Global Outbreak Lineages amp oldid 1160874929, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.