fbpx
Wikipedia

SIRIUS (software)

SIRIUS is a Java-based open-source software for the identification of small molecules from fragmentation mass spectrometry data without the use of spectral libraries. It combines the analysis of isotope patterns in MS1 spectra with the analysis of fragmentation patterns in MS2 spectra. SIRIUS is the umbrella application comprising CSI:FingerID, CANOPUS, COSMIC and ZODIAC.

SIRIUS
Developer(s)Böcker Group at FSU Jena & Bright Giant GmbH
Initial release2009
Stable release
5.8.5 / 8 November 2023
Repositoryhttps://github.com/boecker-lab/sirius
Written inJava
Operating systemLinux, Windows, MacOS
Available inEnglish
Typemass spectrometry,
structure elucidation,
chemistry,
bioinformatics
LicenseGNU Affero General Public License v3.0 for client software,
web-services free for non-commercial use,
commercial subscription offered by Bright Giant GmbH
Websitehttps://bio.informatik.uni-jena.de/software/sirius/

SIRIUS, including its web services for structural elucidation, is freely available to use for academic research. Bright Giant GmbH offers subscription-based access to the SIRIUS web services for commercial users.

SIRIUS is not suitable for analyzing proteomics MS data.

History edit

The SIRIUS software is developed by the group of Sebastian Böcker at the Friedrich Schiller University Jena, Germany and since 2019 together with Bright Giant GmbH. SIRIUS development started in 2009 as a software for identification of the molecular formula by decomposing high-resolution isotope patterns (also called MS1 data).[1] The name is an akronym resulting from this original purpose: Sum formula Identification by Ranking Isotope patterns Using mass Spectrometry.

In 2008 the group introduced the concept of fragmentation trees[2] for identification of the molecular formula based on fragmentation mass spectrometry data, also called tandem MS or MS2 data. Back then, identification of small molecules was approached by searching in a reference spectral library.[3] Examples of such libraries include MassBank,[4] METLIN,[5] or NIST/EPA/NIH EI-MS Library.[6] However, this is limited to known molecules with available standards that have been measured and put in a reference spectral library. For unknown molecules, identification of the molecular formula is a crucial step.[2] In 2011/2012, the group conceived fragmentation trees as a means of structural elucidation by automatically comparing these fragmentation trees.[7][8] Fragmentation pattern similarities are strongly correlated with the chemical similarity of molecules.[8] Thus, aligning the fragmentation tree of an unknown molecule to a set of known molecules helps to elucidate its structure. Fragmentation trees were introduced in SIRIUS 2.[7]

Also in 2012, the group of Juho Rousu at University of Helsinki, Finland, introduced a machine learning method to predict molecular properties from tandem MS data.[9] This concept was brought together with the fragmentation tree concept in 2015 resulting in CSI:FingerID,[10] being introduced in SIRIUS 3. The fragmentation tree is used to predict a molecular fingerprint of the unknown molecule using machine learning, which in turn is used to search a molecular structure database such as PubChem. Molecular structure databases are orders of magnitude larger than reference spectra libraries (PubChem containing ~111 million compounds in 2021[11] compared to NIST Tandem Mass Spectral Library containing ~50.000 compounds in 2023[12]). This kind of structure identification refers to the identity and connectivity (with bond multiplicities) of the atoms, but not stereochemistry information. Elucidation of stereochemistry is currently beyond the power of automated search engines.

SIRIUS 3 also introduced the Graphical User Interface (GUI).

In 2020, in cooperation with the group of Pieter C Dorrestein at UC San Diego, USA, molecular formula identification was improved based on derivative networks from complete biological datasets to rank molecular formula candidates.[13] This method is called ZODIAC and has been integrated into SIRIUS 4.[14]

Also in 2020, in cooperation with Rousu's and Dorrestein's groups, CANOPUS for systematic compound class annotation was introduced to SIRIUS 4.[15]

In 2022, the COSMIC confidence score was added to the CSI:FingerID structure identification workflow in SIRIUS 4, allowing users to determine the trustworthiness of the identification.[16]

Data edit

SIRIUS is using data from liquid-chromatography tandem mass spectrometry (LC-MS/MS). It requires high-resolution, high mass accuracy MS1 and MS2 data as input. LC is not mandatory for SIRIUS, however is often required to separate individual compounds in complex samples.

SIRIUS expects both, MS1 and MS2 spectra, as input. Omitting the MS1 data is possible, but it will make the analysis more time-consuming and can lead to poorer results.

SIRIUS and CSI:FingerID have been trained on a wide variety of data, including data from different instrument types. Certain aspects of the mass spectra are important to successfully process the data:

  • High mass accuracy: The mass deviation of the input spectra should be within 20 ppm. Mass spectrometry devices such as TOF, Orbitrap and FT-ICR usually provide data with high mass accuracy, as do coupled devices such as Q-TOF, IT-TOF or IT-Orbitrap. Spectra measured with a quadrupole or linear trap do not provide the required accuracy for data analysis with SIRIUS.
  • Rich fragmentation spectra: It is not possible to deduce the structure or even the molecular formula from an MS2 spectrum that contains almost no peaks. Prior noise filtering of the spectra is not necessary and not favorable. SIRIUS considers up to 60 peaks in the fragmentation spectrum and decides for itself which of these peaks are regarded as noise.
  • Centroided MS data: SIRIUS does not contain routines for peak picking from profile-mode spectra. msConvert in ProteoWizard can be used to convert to centroided data. Additionally, there are several tools specialized for the preprocessing task, such as OpenMS, MZmine or XCMS. OpenMS[18] and MZmine 3[19] both provide export functions tailored to the needs for SIRIUS.

Different common MS file formats, such as .csv, .ms or .mgf files, can be imported to SIRIUS. SIRIUS can import full LC-MS-runs (.mzML) or single compounds. At present, SIRIUS only handles single-charged compounds.[17]

Features edit

SIRIUS identifies small molecules in a two step approach:[17]

  • First, the molecular formula of the molecule is determined.
  • Second, a molecular fingerprint is predicted to search against a structure database to identify the most likely candidate.

The following algorithms are implemented in SIRIUS:

SIRIUS: Molecular formula identification edit

SIRIUS is the name of the umbrella application, but (for historic reasons) also the name for the identification of the molecular formula. Molecular formula refers to the elemental composition of the molecule. The mere mass of a molecule is not sufficient to determine the correct molecular formula.[17] Even with very high mass accuracy, many molecular formulas can explain a mass measured in a spectrum, in particular in higher mass regions. In SIRIUS, molecular formula identification is done using isotope pattern analysis on the MS1 data as well as fragmentation tree computation on the MS2 data. The score of a molecular formula candidate is a combination of the isotope pattern score and the fragmentation tree score.

To identify the molecular formula, SIRIUS is considering all possible molecular formulas for a set of elements. The elements most abundant in living beings are hydrogen (H), carbon (C), nitrogen (N), oxygen (O), and phosphor (P). This is the default set of elements in SIRIUS. Some less common elements result in very characteristic isotope pattern changes and can be automatically detected.[20] Detectable elements are sulfur (S), chlorine (Cl), bromine (Br), boron (B) and selenium (Se). The current version of SIRIUS uses a deep neural network for auto-detection of elements from the isotope and fragmentation pattern of the query molecule.[14]

For very large molecules or in case of missing data (e.g., a missing isotope pattern), it is possible to restrict SIRIUS to molecular formulas found in a database, such as PubChem.

Decomposition of mass edit

In order to quickly generate a manageable number of molecular formula candidates, the monoisotopic mass is decomposed into all possible molecular formulas that would lead to this mass. There are two definitions of the monoisotopic mass:[21] (1) the sum of the masses of the most abundant naturally occurring stable isotope of each atom (i.e. the highest peak of the isotope pattern) (2) the sum of the masses of the lightest naturally occurring stable isotope of each atom (i.e. the peak of the isotope pattern with the lowest mass). For small molecules, the lightest peak is also mostly the highest peak of the isotope pattern. However, in the computational context of SIRIUS, the second definition is used.

Decomposing the monoisotopic mass into all possible molecular formulas requires a mass interval taking into account the measurement inaccuracy of the instrument. This real-valued decomposition is transformed into a problem instance with integer masses by using a blowup factor. The resulting problem is known as Change-making problem which is well-studied and can be solved in runtime linear in the size of the output.[22]

Isotope pattern analysis edit

Isotope patterns of the candidate molecular formulas are simulated starting with the isotopic distributions of the individual elements, and then combining these distributions by folding.[23][1]

The simulated isotope pattern is compared with the measured pattern by assigning probabilities to the observed masses and intensities.[1]

Fragmentation tree computation edit

A fragmentation tree is a representation of the fragmentation process similar to “fragmentation diagrams” created by experts. The fragmentation tree annotates the MS2 spectrum by providing a molecular formula for each fragment peak. Peaks that do not receive an annotation are considered noise peaks. The fragmentation tree also predicts the fragmentation reactions (called losses) leading to the fragment peaks. Fragmentation trees are a valuable tool for deducing information about the fragmentation but are not a precise depiction of the actual fragmentation process.[7]

To identify the molecular formula of an unknown molecule, a separate fragmentation tree is computed for every molecular formula candidate. In other words, the method attempts to reconstruct the fragmentation process that led to this MS2 spectrum for each candidate molecular formula. This allows to compare the different hypotheses that a particular candidate is actual the correct molecular formula. The best-scoring fragmentation tree (i.e. the fragmentation process that is best explaining the spectrum) corresponds to the most likely molecular formula explanation.

ZODIAC: Improved molecular formula identification edit

ZODIAC improves the ranking of the formula candidates provided by SIRIUS.[13] Organisms produce related metabolites derived from multiple but limited biosynthetic pathways. For a full LC-MS/MS run that is derived from a biological sample or any other set of derivatives the relation of the metabolites is reflected in their similarity. Those similarities are in turn reflected in joint fragments and losses between the fragmentation trees and can be leveraged to improve molecular formula identification of the individual molecules.

ZODIAC uses the top X molecular formula candidates for each molecule from SIRIUS to build a similarity network, and uses Bayesian statistics to re-rank those candidates. Prior probabilities are derived from fragmentation tree similarity. Finding an optimal solution to the resulting computational problem is NP-hard, therefore Gibbs sampling is used.

ZODIAC stands for ZODIAC: Organic compound Determination by Integral Assignment of elemental Compositions.

CSI:FingerID: Structure database search edit

CSI:FIngerID identifies the structure of a molecule by predicting its molecular fingerprint and using this fingerprint to search in a molecular structure database.[10]

Molecular fingerprints edit

A molecular fingerprint is a binary vector, where each position corresponds to a specific molecular property. In this representation, a given position X may encode the presence or absence of a particular substructure, with '1' indicating presence and '0' indicating absence. Various types of molecular fingerprints exist, including PubChem CACTVS fingerprints, Klekota-Roth fingerprints,[24] MACCS fingerprints, and Extended-Connectivity Fingerprints (ECFP).[25] A molecular fingerprint can be deterministically computed from a given molecular structure. Different molecular structures may yield the same molecular fingerprint.

Predicting molecular fingerprints edit

CSI:FingerID predicts a probabilistic fingerprint with a variety of molecular properties from several fingerprint types. The fingerprint is predicted from the given spectrum and its corresponding fragmentation tree using deep kernel learning,[26][10] which is a combination of kernel methods and deep neural networks. Not only the top scoring molecular formula but multiple high-scoring molecular formula candidates are considered.

Comparing molecular fingerprints edit

To search in a molecular structure database requires a metric to compare and score the molecular fingerprints. Tanimoto similarity (Jaccard index) is a commonly employed metric. A similarity value of 1 signifies identical fingerprints, while a value of 0 indicates structures that do not share any molecular properties. The calculated similarity value depends on the choice of fingerprint type.

CSI:FingerID employs a logarithmic posterior probability to rank the structure candidates, where scores are represented as negative numbers, and zero is the optimum.[27] This scoring function results in a higher number of correct identifications.[10] Tanimoto similarities are also given.

COSMIC: Identification confidence edit

The COSMIC confidence score assigns a confidence to CSI:FingerID structure identifications.[16] The idea is similar to False Discovery Rates: All molecules in a large dataset are analysed using CSI:FingerID, the top-ranked hit for each molecule will be evaluated by COSMIC and the most trustworthy identifications can be selected for further analysis. COSMIC does not re-rank structure candidates of a particular molecule nor does it discard any identifications.

COSMIC employs a confidence score that combines E-value estimation and a linear support vector machine (SVM) with enforced directionality. Calibration of CSI:FingerID scores is achieved using E-value estimates.[28] Generating decoys for small molecule structures is a non-trivial task, that is why candidates in PubChem serve as a proxy for decoys here.

The score distribution is modeled as a mixture distribution of log-normal distributions, and the P-value and E-value of a hit score are estimated using the kernel density estimate of PubChem candidate scores. The SVM is employed to classify whether a hit is correct, utilizing features such as the calibrated score, score differences to other candidates, the total peak intensity explained by the fragmentation tree, and the cardinality of molecular fingerprints. Learning is constrained to a linear SVM to mitigate the risk of overfitting, and the directionality of features is enforced. This involves making upfront decisions about whether high or low values of a feature should enhance the confidence in an identification. For instance, a high CSI:FingerID score of a hit should increase but never decrease the confidence that the hit is correct. Some features necessitate the existence of at least two candidates for comparison, and separate SVMs are trained for single instances. The decision values of the SVM are mapped to posterior probability estimates using Platt scaling.[29] This comprehensive approach ensures a robust and nuanced assessment of the confidence in molecule identifications.[16]

CANOPUS: compound class prediction edit

CANOPUS is short for class assignment and ontology prediction using mass spectrometry.[15] It predicts the compound classes from the molecular fingerprint predicted by CSI:FingerID. This approach is completely database-free, i.e. it is not even limited to molecules that are listed in structure databases.

CANOPUS employs a deep neural network (DNN)[30] to predict 2,497 compound classes. The DNN was trained on 4.10 million compound structures with compound classes assigned by ClassyFire.[31] No MS/MS data was used for training, but instead simulated ‘realistic’ probabilistic fingerprints for the training molecular structures were used. The DNN predicts all compound classes simultaneously.

For full biological datasets, CANOPUS provides a comprehensive overview of compound classes present in the sample and allows for comparisons between different cohorts at compound class level.

Areas of application edit

Small molecules are essential components found throughout nature, playing a significant role in various fields such as drug discovery, diagnostics, food science, environmental monitoring, and more. Effectively addressing many global challenges hinges on the comprehensive identification of small molecules in complex samples. These complex mixtures contain thousands of different molecules measurable in a single mass spectrometry run.

The identification of unknown small molecules is considered a critical bottleneck in metabolomics, natural product research, and related fields, given that widely over 90% of all small molecules remain unknown.[32][33] Commonly, analyses were based on targeted approaches that are limited to the rediscovery of known molecules. In contrast, untargeted analysis is a top-down strategy that avoids the need for a prior specific hypothesis on expected small molecules. The focus shifts from asking, "Is molecule X present in the sample?" to "Which (unknown) molecules are present in the sample and might be relevant for downstream analysis?"

SIRIUS is designed for the untargeted structural elucidation of unknown molecules, addressing various challenges:

  • The correct molecular structure is prominently ranked from an extensive list of candidates. This can be compared to a Google search where the optimal answer is expected to be among the top three.[10]
  • It can be assessed whether the top candidate is indeed correct.[16]
  • Structural information is available even for molecules absent in extensive structure databases, including details on compound class and substructure information.[15]

Examples of application edit

  • Neonatal dried blood spots are important for newborn screening and a powerful source for investigating the potential metabolic etiologies of various diseases using untargeted LC-MS-based metabolomics. Researchers used SIRIUS to investigate the stability of metabolites and classes of molecules in neonatal dried blood spot biobanks.[34]
  • Marine microorganisms offer a rich source of bioactive compounds with unique structures and remarkable biological activity. This makes them an important resource for the search for new therapeutic compounds. Researchers are using SIRIUS, to narrow down the search to the most promising microorganisms.[35]
  • Pediatric asthma poses diagnostic challenges due to its variable presentation. Breath analysis could be a game-changer in pediatric allergic asthma management. By identifying unique exhaled metabolic signatures using SIRIUS, researchers developed an approach to diagnose children with allergic asthma.[36]
  • Thiacloprid is a first-generation, widely used, neonicotinoid insecticide. Its persistence in the environment and potential adverse effects on human health have raised significant concerns. Elucidating the impurity profile of pesticides is crucial for assessing their environmental impact and potential risks, and setting acceptable limits for impurities. Using SIRIUS, researchers demonstrated an approach for identifying structurally related impurities in pesticides.[37]
  • Under certain conditions, two bacterial species can thrive together in a dual-species biofilm. The cooperation between P. aeruginosa and S. aureus in cystic fibrosis leads to increased disease severity. Using SIRIUS, researchers identified a metabolite that could be related to the increased pathogenesis of this dual-species biofilm in cystic fibrosis.[38]
  • Our skin hosts a diverse community of microorganisms known as the skin microbiota. Using SIRIUS, researchers identified changes in the skin metabolome that are more pronounced than changes in the microbial composition, suggesting that even subtle shifts in microbial abundance can lead to significant effects on the skin.[39]

Limitations edit

Limitation of the measurement method edit

Mass spectra alone lack sufficient information to unambiguously identify every molecule. Some molecules produce almost indistinguishable spectra – even more similar than the same molecule measured on two different instruments.[21] Extensive follow-up experiments are required for unambiguous identification.

Based thereon, it is impossible to always correctly identify a molecular structure merely from a mass spectrum. Thus, CSI:FingerID as well as other methods for structure database search, cannot guarantee finding the correct molecular structure as first hit. That is why it is important to have the correct structure ranked very high from an extensive list of candidates and to assess the confidence in the top hit.

Limitation of structure databases edit

Structure databases are orders of magnitude larger than spectral libraries but still incomplete.[40] It is understood that not every existing biomolecule is or will be contained in structure databases.

For these instances, SIRIUS offers several solutions:

  • SIRIUS can search in databases of hypothetical structures.[16] This could be for example interesting for finding derivatives.
  • The predicted molecular fingerprint offers structural information about, e.g., substructures.[10]
  • CANOPUS predicts the compound classes of a molecule without searching in a database.[15]

Independent evaluation of the software edit

CASMI (Critical Assessment of Small Molecule Identification)[41] is an open contest on the identification of small molecules from mass spectrometry data, and was launched in 2012 by Emma Schymanski and Steffen Neumann.[42]

In CASMI 2016, CSI:FingerID and a derivative of CSI:FingerID, in which the Böcker Group was also involved, won first and second place in the category “Best Automatic Structural Identification - In Silico Fragmentation Only”. Also, CSI:FingerID had the best result for ranking the correct molecule structure at position one (70 out of 127, positive mode).[43][44]

In CASMI 2017, SIRIUS plus CSI:FingerID won in 3 of 4 categories: “Best Structure Identification on Natural Products”, “Best Automatic Structural Identification - In Silico Fragmentation Only”, “Best Automatic Candidate Ranking”.[45]

In CASMI 2022, six out of 16 contestants used SIRIUS in their workflow to identify the best molecular structure candidates. SIRIUS won in the categories “Correct elemental formulas”, “Correct compound structure classes” and “Correct 2D chemical structures”. CASMI 2022 included compounds that were not even contained in PubChem.[46]

Awards and recognition edit

Sebastian Böcker's group at FSU Jena won the 2022 Thuringian Research Award in the Applied Research category for SIRIUS and the underlying methods.[47][48]

SIRIUS was recognized as a "method to watch" by Nature Methods in 2020.[49]

Licences edit

SIRIUS is developed by the group of Sebastian Böcker at the FSU Jena in close collaboration with the Bright Giant GmbH. SIRIUS is provided as a software-as-a-service solution. The client software is open-source and installed on the users’ computers. Molecular formula annotation using fragmentation trees and isotope pattern analysis is performed on your local computer without subscription requirement.

The SIRIUS web services for structural elucidation, including molecular fingerprint prediction, structure database search, confidence score assessment and compound class prediction, require a user account. The web services are free for academic/non-commercial use provided/hosted by the FSU Jena. Academic institutions are identified by their email domain and access will be granted automatically. In some cases, further validation might be required.

Bright Giant GmbH offers subscription-based access to the SIRIUS web services for structural elucidation for commercial users.

Alternatives edit

Other algorithms and software for searching in structure databases are CFM-ID,[50][51] ICEBERG,[52] MetFrag,[53] MS-FINDER,[54][55] MetaboScape® (Bruker), MassHunter (Agilent) or Compound Discoverer™ (Thermo Fisher Scientific).

See also edit

References edit

  1. ^ a b c d Böcker, Sebastian; Letzel, Matthias C.; Lipták, Zsuzsanna; Pervukhin, Anton (15 January 2009). "SIRIUS: decomposing isotope patterns for metabolite identification". Bioinformatics. 25 (2): 218–224. doi:10.1093/bioinformatics/btn603. PMC 2639009. PMID 19015140.
  2. ^ a b Böcker, Sebastian; Rasche, Florian (15 August 2008). "Towards de novo identification of metabolites by analyzing tandem mass spectra". Bioinformatics. 24 (16): i49–i55. doi:10.1093/bioinformatics/btn270. PMID 18689839.
  3. ^ Scheubert, Kerstin; Hufsky, Franziska; Böcker, Sebastian (December 2013). "Computational mass spectrometry for small molecules". Journal of Cheminformatics. 5 (1): 12. doi:10.1186/1758-2946-5-12. PMC 3648359. PMID 23453222.
  4. ^ Horai, Hisayuki; Arita, Masanori; Kanaya, Shigehiko; Nihei, Yoshito; Ikeda, Tasuku; Suwa, Kazuhiro; Ojima, Yuya; Tanaka, Kenichi; Tanaka, Satoshi; Aoshima, Ken; Oda, Yoshiya; Kakazu, Yuji; Kusano, Miyako; Tohge, Takayuki; Matsuda, Fumio; Sawada, Yuji; Hirai, Masami Yokota; Nakanishi, Hiroki; Ikeda, Kazutaka; Akimoto, Naoshige; Maoka, Takashi; Takahashi, Hiroki; Ara, Takeshi; Sakurai, Nozomu; Suzuki, Hideyuki; Shibata, Daisuke; Neumann, Steffen; Iida, Takashi; Tanaka, Ken; Funatsu, Kimito; Matsuura, Fumito; Soga, Tomoyoshi; Taguchi, Ryo; Saito, Kazuki; Nishioka, Takaaki (July 2010). "MassBank: a public repository for sharing mass spectral data for life sciences". Journal of Mass Spectrometry. 45 (7): 703–714. Bibcode:2010JMSp...45..703H. doi:10.1002/jms.1777. PMID 20623627.
  5. ^ Smith, Colin A; Maille, Grace O??; Want, Elizabeth J; Qin, Chuan; Trauger, Sunia A; Brandon, Theodore R; Custodio, Darlene E; Abagyan, Ruben; Siuzdak, Gary (December 2005). "METLIN: A Metabolite Mass Spectral Database". Therapeutic Drug Monitoring. 27 (6): 747–751. doi:10.1097/01.ftd.0000179845.53213.39. PMID 16404815. S2CID 14774455.
  6. ^ "Mass Spectrometry Data Center, NIST". chemdata.nist.gov.
  7. ^ a b c Rasche, Florian; Svatoš, Aleš; Maddula, Ravi Kumar; Böttcher, Christoph; Böcker, Sebastian (15 February 2011). "Computing Fragmentation Trees from Tandem Mass Spectrometry Data". Analytical Chemistry. 83 (4): 1243–1251. doi:10.1021/ac101825k. PMID 21182243.
  8. ^ a b Rasche, Florian; Scheubert, Kerstin; Hufsky, Franziska; Zichner, Thomas; Kai, Marco; Svatoš, Aleš; Böcker, Sebastian (3 April 2012). "Identifying the Unknowns by Aligning Fragmentation Trees". Analytical Chemistry. 84 (7): 3417–3426. doi:10.1021/ac300304u. PMID 22390817.
  9. ^ Heinonen, Markus; Shen, Huibin; Zamboni, Nicola; Rousu, Juho (15 September 2012). "Metabolite identification and molecular fingerprint prediction through machine learning". Bioinformatics. 28 (18): 2333–2341. doi:10.1093/bioinformatics/bts437. hdl:20.500.11850/55584. PMID 22815355.
  10. ^ a b c d e f Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian (13 October 2015). "Searching molecular structure databases with tandem mass spectra using CSI:FingerID". Proceedings of the National Academy of Sciences. 112 (41): 12580–12585. Bibcode:2015PNAS..11212580D. doi:10.1073/pnas.1509788112. PMC 4611636. PMID 26392543.
  11. ^ Kim, Sunghwan; Chen, Jie; Cheng, Tiejun; Gindulyte, Asta; He, Jia; He, Siqian; Li, Qingliang; Shoemaker, Benjamin A; Thiessen, Paul A; Yu, Bo; Zaslavsky, Leonid; Zhang, Jian; Bolton, Evan E (8 January 2021). "PubChem in 2021: new data content and improved web interfaces". Nucleic Acids Research. 49 (D1): D1388–D1395. doi:10.1093/nar/gkaa971. PMC 7778930. PMID 33151290.
  12. ^ "2023 Release of the NIST EI and Tandem Libraries" (PDF). National Institute of Standards and Technology (NIST). Retrieved 12 January 2023.
  13. ^ a b Ludwig, Marcus; Nothias, Louis-Félix; Dührkop, Kai; Koester, Irina; Fleischauer, Markus; Hoffmann, Martin A.; Petras, Daniel; Vargas, Fernando; Morsy, Mustafa; Aluwihare, Lihini; Dorrestein, Pieter C.; Böcker, Sebastian (13 October 2020). "Database-independent molecular formula annotation using Gibbs sampling through ZODIAC". Nature Machine Intelligence. 2 (10): 629–641. doi:10.1038/s42256-020-00234-6.
  14. ^ a b Dührkop, Kai; Fleischauer, Markus; Ludwig, Marcus; Aksenov, Alexander A.; Melnik, Alexey V.; Meusel, Marvin; Dorrestein, Pieter C.; Rousu, Juho; Böcker, Sebastian (April 2019). "SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information". Nature Methods. 16 (4): 299–302. doi:10.1038/s41592-019-0344-8. PMID 30886413. S2CID 81985235.
  15. ^ a b c d Dührkop, Kai; Nothias, Louis-Félix; Fleischauer, Markus; Reher, Raphael; Ludwig, Marcus; Hoffmann, Martin A.; Petras, Daniel; Gerwick, William H.; Rousu, Juho; Dorrestein, Pieter C.; Böcker, Sebastian (April 2021). "Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra". Nature Biotechnology. 39 (4): 462–471. doi:10.1038/s41587-020-0740-8. PMID 33230292.
  16. ^ a b c d e Hoffmann, Martin A.; Nothias, Louis-Félix; Ludwig, Marcus; Fleischauer, Markus; Gentry, Emily C.; Witting, Michael; Dorrestein, Pieter C.; Dührkop, Kai; Böcker, Sebastian (March 2022). "High-confidence structural annotation of metabolites absent from spectral libraries". Nature Biotechnology. 40 (3): 411–421. doi:10.1038/s41587-021-01045-9. PMC 8926923. PMID 34650271.
  17. ^ a b c d Ludwig, Marcus; Fleischauer, Markus; Dührkop, Kai; Hoffmann, Martin A.; Böcker, Sebastian (2020). "De Novo Molecular Formula Annotation and Structure Elucidation Using SIRIUS 4". Computational Methods and Data Analysis for Metabolomics. Methods in Molecular Biology. Vol. 2104. pp. 185–207. doi:10.1007/978-1-0716-0239-3_11. ISBN 978-1-0716-0238-6. PMID 31953819. S2CID 210709539.
  18. ^ Röst, Hannes L; Sachsenberg, Timo; Aiche, Stephan; Bielow, Chris; Weisser, Hendrik; Aicheler, Fabian; Andreotti, Sandro; Ehrlich, Hans-Christian; Gutenbrunner, Petra; Kenar, Erhan; Liang, Xiao; Nahnsen, Sven; Nilse, Lars; Pfeuffer, Julianus; Rosenberger, George; Rurik, Marc; Schmitt, Uwe; Veit, Johannes; Walzer, Mathias; Wojnar, David; Wolski, Witold E; Schilling, Oliver; Choudhary, Jyoti S; Malmström, Lars; Aebersold, Ruedi; Reinert, Knut; Kohlbacher, Oliver (September 2016). "OpenMS: a flexible open-source software platform for mass spectrometry data analysis" (PDF). Nature Methods. 13 (9): 741–748. doi:10.1038/nmeth.3959. PMID 27575624. S2CID 873670.
  19. ^ Schmid, Robin; Heuckeroth, Steffen; Korf, Ansgar; Smirnov, Aleksandr; Myers, Owen; Dyrlund, Thomas S.; Bushuiev, Roman; Murray, Kevin J.; Hoffmann, Nils; Lu, Miaoshan; Sarvepalli, Abinesh; Zhang, Zheng; Fleischauer, Markus; Dührkop, Kai; Wesner, Mark; Hoogstra, Shawn J.; Rudt, Edward; Mokshyna, Olena; Brungs, Corinna; Ponomarov, Kirill; Mutabdžija, Lana; Damiani, Tito; Pudney, Chris J.; Earll, Mark; Helmer, Patrick O.; Fallon, Timothy R.; Schulze, Tobias; Rivas-Ubach, Albert; Bilbao, Aivett; Richter, Henning; Nothias, Louis-Félix; Wang, Mingxun; Orešič, Matej; Weng, Jing-Ke; Böcker, Sebastian; Jeibmann, Astrid; Hayen, Heiko; Karst, Uwe; Dorrestein, Pieter C.; Petras, Daniel; Du, Xiuxia; Pluskal, Tomáš (April 2023). "Integrative analysis of multimodal mass spectrometry data in MZmine 3". Nature Biotechnology. 41 (4): 447–449. doi:10.1038/s41587-023-01690-2. PMC 10496610. PMID 36859716.
  20. ^ Meusel, Marvin; Hufsky, Franziska; Panter, Fabian; Krug, Daniel; Müller, Rolf; Böcker, Sebastian (2 August 2016). "Predicting the Presence of Uncommon Elements in Unknown Biomolecules from Isotope Patterns". Analytical Chemistry. 88 (15): 7556–7566. doi:10.1021/acs.analchem.6b01015. PMID 27398867.
  21. ^ a b Böcker, Sebastian (29 April 2022). Algorithmic Mass Spectrometry (PDF) (Version 0.8.4 ed.). Retrieved 12 January 2024.
  22. ^ Bocker, Sebastian; Liptak, Zsuzsanna (August 2007). "A Fast and Simple Algorithm for the Money Changing Problem". Algorithmica. 48 (4): 413–432. doi:10.1007/s00453-007-0162-8. S2CID 17652643.
  23. ^ Kubinyi, Hugo (June 1991). "Calculation of isotope distributions in mass spectrometry. A trivial solution for a non-trivial problem". Analytica Chimica Acta. 247 (1): 107–119. Bibcode:1991AcAC..247..107K. doi:10.1016/S0003-2670(00)83059-7.
  24. ^ Klekota, Justin; Roth, Frederick P. (1 November 2008). "Chemical substructures that enrich for biological activity". Bioinformatics. 24 (21): 2518–2525. doi:10.1093/bioinformatics/btn479. PMC 2732283. PMID 18784118.
  25. ^ Rogers, David; Hahn, Mathew (24 May 2010). "Extended-Connectivity Fingerprints". Journal of Chemical Information and Modeling. 50 (5): 742–754. doi:10.1021/ci100050t. PMID 20426451.
  26. ^ Dührkop, Kai (24 June 2022). "Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra". Bioinformatics. 38 (Supplement_1): i342–i349. doi:10.1093/bioinformatics/btac260. PMC 9235503. PMID 35758813.
  27. ^ Ludwig, Marcus; Dührkop, Kai; Böcker, Sebastian (1 July 2018). "Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints". Bioinformatics. 34 (13): i333–i340. doi:10.1093/bioinformatics/bty245. PMC 6022630. PMID 29949965.
  28. ^ Keich, Uri; Noble, William Stafford (6 February 2015). "On the Importance of Well-Calibrated Scores for Identifying Shotgun Proteomics Spectra". Journal of Proteome Research. 14 (2): 1147–1160. doi:10.1021/pr5010983. PMC 4324453. PMID 25482958.
  29. ^ Platt, John C. (29 September 2000). "Probabilities for SV Machines". Advances in Large-Margin Classifiers: 61–74. doi:10.7551/mitpress/1113.003.0008. ISBN 978-0-262-28397-7.
  30. ^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (28 May 2015). "Deep learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
  31. ^ Djoumbou Feunang, Yannick; Eisner, Roman; Knox, Craig; Chepelev, Leonid; Hastings, Janna; Owen, Gareth; Fahy, Eoin; Steinbeck, Christoph; Subramanian, Shankar; Bolton, Evan; Greiner, Russell; Wishart, David S. (December 2016). "ClassyFire: automated chemical classification with a comprehensive, computable taxonomy". Journal of Cheminformatics. 8 (1): 61. doi:10.1186/s13321-016-0174-y. PMC 5096306. PMID 27867422.
  32. ^ da Silva, Ricardo R.; Dorrestein, Pieter C.; Quinn, Robert A. (13 October 2015). "Illuminating the dark matter in metabolomics". Proceedings of the National Academy of Sciences. 112 (41): 12549–12550. doi:10.1073/pnas.1516878112. PMC 4611607. PMID 26430243.
  33. ^ Hulleman, Tobias; Turkina, Viktoriia; O’Brien, Jake W.; Chojnacka, Aleksandra; Thomas, Kevin V.; Samanipour, Saer (26 September 2023). "Critical Assessment of the Chemical Space Covered by LC–HRMS Non-Targeted Analysis". Environmental Science & Technology. 57 (38): 14101–14112. Bibcode:2023EnST...5714101H. doi:10.1021/acs.est.3c03606. PMC 10537454. PMID 37704971.
  34. ^ Ottosson, Filip; Russo, Francesco; Abrahamsson, Anna; MacSween, Nadia; Courraud, Julie; Nielsen, Zaki Krag; Hougaard, David M.; Cohen, Arieh S.; Ernst, Madeleine (5 April 2023). "Effects of Long-Term Storage on the Biobanked Neonatal Dried Blood Spot Metabolome". Journal of the American Society for Mass Spectrometry. 34 (4): 685–694. doi:10.1021/jasms.2c00358. PMC 10080689. PMID 36913955.
  35. ^ Le Loarer, Alexandre; Marcellin-Gros, Rémy; Dufossé, Laurent; Bignon, Jérôme; Frédérich, Michel; Ledoux, Allison; Queiroz, Emerson Ferreira; Wolfender, Jean-Luc; Gauvin-Bialecki, Anne; Fouillaud, Mireille (8 March 2023). "Prioritization of Microorganisms Isolated from the Indian Ocean Sponge Scopalina hapalia Based on Metabolomic Diversity and Biological Activity for the Discovery of Natural Products". Microorganisms. 11 (3): 697. doi:10.3390/microorganisms11030697. PMC 10057949. PMID 36985270.
  36. ^ Weber, Ronja; Streckenbach, Bettina; Welti, Lara; Inci, Demet; Kohler, Malcolm; Perkins, Nathan; Zenobi, Renato; Micic, Srdjan; Moeller, Alexander (31 March 2023). "Online breath analysis with SESI/HRMS for metabolic signatures in children with allergic asthma". Frontiers in Molecular Biosciences. 10. doi:10.3389/fmolb.2023.1154536. PMC 10102578. PMID 37065443.
  37. ^ Li, Xianjiang; Tu, Mengling; Yang, Bingxin; Ma, Wen; Li, Hongmei (October 2023). "Structurally related impurity profiling of thiacloprid by orbitrap and de novo identification tool". Microchemical Journal. 193: 109123. doi:10.1016/j.microc.2023.109123. S2CID 260123222.
  38. ^ Uzi-Gavrilov, S; Tik, Z; Sabti, O; Meijler, MM (17 July 2023). "Chemical Modification of a Bacterial Siderophore by a Competitor in Dual-Species Biofilms". Angewandte Chemie (International ed. In English). 62 (29): e202300585. doi:10.1002/anie.202300585. PMID 37211536.
  39. ^ Li, Min; Mao, Junhong; Diaz, Isabel; Kopylova, Evguenia; Melnik, Alexey V.; Aksenov, Alexander A.; Tipton, Craig D.; Soliman, Nadia; Morgan, Andrea M.; Boyd, Thomas (18 July 2023). "Multi-omic approach to decipher the impact of skincare products with pre/postbiotics on skin microbiome and metabolome". Frontiers in Medicine. 10. doi:10.3389/fmed.2023.1165980. PMC 10392128. PMID 37534320.
  40. ^ Hufsky, Franziska; Böcker, Sebastian (September 2017). "Mining molecular structure databases: Identification of small molecules based on fragmentation mass spectrometry data". Mass Spectrometry Reviews. 36 (5): 624–633. Bibcode:2017MSRv...36..624H. doi:10.1002/mas.21489. PMID 26763615.
  41. ^ "Critical Assessment of Small Molecule Identification". Retrieved 12 January 2023.
  42. ^ Schymanski, Emma; Neumann, Steffen (25 June 2013). "The Critical Assessment of Small Molecule Identification (CASMI): Challenges and Solutions". Metabolites. 3 (3): 517–538. doi:10.3390/metabo3030517. PMC 3901296. PMID 24958137.
  43. ^ Schymanski, Emma L.; Ruttkies, Christoph; Krauss, Martin; Brouard, Céline; Kind, Tobias; Dührkop, Kai; Allen, Felicity; Vaniya, Arpana; Verdegem, Dries; Böcker, Sebastian; Rousu, Juho; Shen, Huibin; Tsugawa, Hiroshi; Sajed, Tanvir; Fiehn, Oliver; Ghesquière, Bart; Neumann, Steffen (December 2017). "Critical Assessment of Small Molecule Identification 2016: automated methods". Journal of Cheminformatics. 9 (1): 22. doi:10.1186/s13321-017-0207-1. PMC 5368104. PMID 29086042.
  44. ^ "CASMI 2016 Results". Retrieved 12 January 2023.
  45. ^ "CASMI 2017 Results". Retrieved 12 January 2023.
  46. ^ "CASMI 2022 Results". Retrieved 12 January 2023.
  47. ^ "Thüringer Forschungspreis 2022". YouTube. Thüringer Wirtschafts- & Wissenschaftsministerium. Retrieved 12 January 2023.
  48. ^ Schönfelder, Ute (6 April 2022). "Artificial Intelligence identifies small molecules: Bioinformatics team awarded 2022 Thuringian Research Prize in the category Applied Research". Friedrich Schiller University Jena. Retrieved 12 January 2023.
  49. ^ Singh, Arunima (January 2020). "Tools for metabolomics". Nature Methods. 17 (1): 24. doi:10.1038/s41592-019-0710-6. PMID 31907484.
  50. ^ Allen, Felicity; Pon, Allison; Wilson, Michael; Greiner, Russ; Wishart, David (1 July 2014). "CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra". Nucleic Acids Research. 42 (W1): W94–W99. doi:10.1093/nar/gku436. PMC 4086103. PMID 24895432.
  51. ^ Wang, Fei; Allen, Dana; Tian, Siyang; Oler, Eponine; Gautam, Vasuk; Greiner, Russell; Metz, Thomas O.; Wishart, David S. (5 July 2022). "CFM-ID 4.0 - a web server for accurate MS-based metabolite identification". Nucleic Acids Research. 50 (W1): W165–W174. doi:10.1093/nar/gkac383. PMC 9252813. PMID 35610037.
  52. ^ Goldman, Samuel; Li, Janet; Coley, Connor W. (2023). "Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks". arXiv:2304.13136 [q-bio.QM].
  53. ^ Ruttkies, Christoph; Schymanski, Emma L.; Wolf, Sebastian; Hollender, Juliane; Neumann, Steffen (December 2016). "MetFrag relaunched: incorporating strategies beyond in silico fragmentation". Journal of Cheminformatics. 8 (1): 3. doi:10.1186/s13321-016-0115-9. PMC 4732001. PMID 26834843.
  54. ^ Tsugawa, Hiroshi; Kind, Tobias; Nakabayashi, Ryo; Yukihira, Daichi; Tanaka, Wataru; Cajka, Tomas; Saito, Kazuki; Fiehn, Oliver; Arita, Masanori (16 August 2016). "Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software". Analytical Chemistry. 88 (16): 7946–7958. doi:10.1021/acs.analchem.6b00770. PMC 7063832. PMID 27419259.
  55. ^ Lai, Zijuan; Tsugawa, Hiroshi; Wohlgemuth, Gert; Mehta, Sajjan; Mueller, Matthew; Zheng, Yuxuan; Ogiwara, Atsushi; Meissen, John; Showalter, Megan; Takeuchi, Kohei; Kind, Tobias; Beal, Peter; Arita, Masanori; Fiehn, Oliver (January 2018). "Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics". Nature Methods. 15 (1): 53–56. doi:10.1038/nmeth.4512. PMC 6358022. PMID 29176591.

sirius, software, other, uses, sirius, disambiguation, this, article, multiple, issues, please, help, improve, discuss, these, issues, talk, page, learn, when, remove, these, template, messages, this, article, need, rewritten, comply, with, wikipedia, quality,. For other uses see Sirius disambiguation This article has multiple issues Please help improve it or discuss these issues on the talk page Learn how and when to remove these template messages This article may need to be rewritten to comply with Wikipedia s quality standards You can help The talk page may contain suggestions January 2024 This article needs attention from an expert in Software Please add a reason or a talk parameter to this template to explain the issue with the article WikiProject Software may be able to help recruit an expert January 2024 This article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details January 2024 Learn how and when to remove this template message Learn how and when to remove this template message SIRIUS is a Java based open source software for the identification of small molecules from fragmentation mass spectrometry data without the use of spectral libraries It combines the analysis of isotope patterns in MS1 spectra with the analysis of fragmentation patterns in MS2 spectra SIRIUS is the umbrella application comprising CSI FingerID CANOPUS COSMIC and ZODIAC SIRIUSDeveloper s Bocker Group at FSU Jena amp Bright Giant GmbHInitial release2009Stable release5 8 5 8 November 2023Repositoryhttps github com boecker lab siriusWritten inJavaOperating systemLinux Windows MacOSAvailable inEnglishTypemass spectrometry structure elucidation chemistry bioinformaticsLicenseGNU Affero General Public License v3 0 for client software web services free for non commercial use commercial subscription offered by Bright Giant GmbHWebsitehttps bio informatik uni jena de software sirius SIRIUS including its web services for structural elucidation is freely available to use for academic research Bright Giant GmbH offers subscription based access to the SIRIUS web services for commercial users SIRIUS is not suitable for analyzing proteomics MS data Contents 1 History 2 Data 3 Features 3 1 SIRIUS Molecular formula identification 3 1 1 Decomposition of mass 3 1 2 Isotope pattern analysis 3 1 3 Fragmentation tree computation 3 2 ZODIAC Improved molecular formula identification 3 3 CSI FingerID Structure database search 3 3 1 Molecular fingerprints 3 3 2 Predicting molecular fingerprints 3 3 3 Comparing molecular fingerprints 3 4 COSMIC Identification confidence 3 5 CANOPUS compound class prediction 4 Areas of application 4 1 Examples of application 5 Limitations 5 1 Limitation of the measurement method 5 2 Limitation of structure databases 6 Independent evaluation of the software 7 Awards and recognition 8 Licences 9 Alternatives 10 See also 11 ReferencesHistory editThe SIRIUS software is developed by the group of Sebastian Bocker at the Friedrich Schiller University Jena Germany and since 2019 together with Bright Giant GmbH SIRIUS development started in 2009 as a software for identification of the molecular formula by decomposing high resolution isotope patterns also called MS1 data 1 The name is an akronym resulting from this original purpose Sum formula Identification by Ranking Isotope patterns Using mass Spectrometry In 2008 the group introduced the concept of fragmentation trees 2 for identification of the molecular formula based on fragmentation mass spectrometry data also called tandem MS or MS2 data Back then identification of small molecules was approached by searching in a reference spectral library 3 Examples of such libraries include MassBank 4 METLIN 5 or NIST EPA NIH EI MS Library 6 However this is limited to known molecules with available standards that have been measured and put in a reference spectral library For unknown molecules identification of the molecular formula is a crucial step 2 In 2011 2012 the group conceived fragmentation trees as a means of structural elucidation by automatically comparing these fragmentation trees 7 8 Fragmentation pattern similarities are strongly correlated with the chemical similarity of molecules 8 Thus aligning the fragmentation tree of an unknown molecule to a set of known molecules helps to elucidate its structure Fragmentation trees were introduced in SIRIUS 2 7 Also in 2012 the group of Juho Rousu at University of Helsinki Finland introduced a machine learning method to predict molecular properties from tandem MS data 9 This concept was brought together with the fragmentation tree concept in 2015 resulting in CSI FingerID 10 being introduced in SIRIUS 3 The fragmentation tree is used to predict a molecular fingerprint of the unknown molecule using machine learning which in turn is used to search a molecular structure database such as PubChem Molecular structure databases are orders of magnitude larger than reference spectra libraries PubChem containing 111 million compounds in 2021 11 compared to NIST Tandem Mass Spectral Library containing 50 000 compounds in 2023 12 This kind of structure identification refers to the identity and connectivity with bond multiplicities of the atoms but not stereochemistry information Elucidation of stereochemistry is currently beyond the power of automated search engines SIRIUS 3 also introduced the Graphical User Interface GUI In 2020 in cooperation with the group of Pieter C Dorrestein at UC San Diego USA molecular formula identification was improved based on derivative networks from complete biological datasets to rank molecular formula candidates 13 This method is called ZODIAC and has been integrated into SIRIUS 4 14 Also in 2020 in cooperation with Rousu s and Dorrestein s groups CANOPUS for systematic compound class annotation was introduced to SIRIUS 4 15 In 2022 the COSMIC confidence score was added to the CSI FingerID structure identification workflow in SIRIUS 4 allowing users to determine the trustworthiness of the identification 16 Data editSIRIUS is using data from liquid chromatography tandem mass spectrometry LC MS MS It requires high resolution high mass accuracy MS1 and MS2 data as input LC is not mandatory for SIRIUS however is often required to separate individual compounds in complex samples MS1 data refers mainly to the isotope pattern of the compound Due to the natural isotopic distributions of the elements several peaks in the mass spectrum correspond to the same type of sample molecule reflecting its isotope pattern 1 MS2 data refers to the fragmentation pattern of the compound MS2 is also known as tandem mass spectrometry or MS MS The statistical model of SIRIUS and the machine learning model of CSI FingerID were trained on MS2 spectra created by collision induced dissociation CID as commonly applied in LC MS MS experiments 17 SIRIUS expects both MS1 and MS2 spectra as input Omitting the MS1 data is possible but it will make the analysis more time consuming and can lead to poorer results SIRIUS and CSI FingerID have been trained on a wide variety of data including data from different instrument types Certain aspects of the mass spectra are important to successfully process the data High mass accuracy The mass deviation of the input spectra should be within 20 ppm Mass spectrometry devices such as TOF Orbitrap and FT ICR usually provide data with high mass accuracy as do coupled devices such as Q TOF IT TOF or IT Orbitrap Spectra measured with a quadrupole or linear trap do not provide the required accuracy for data analysis with SIRIUS Rich fragmentation spectra It is not possible to deduce the structure or even the molecular formula from an MS2 spectrum that contains almost no peaks Prior noise filtering of the spectra is not necessary and not favorable SIRIUS considers up to 60 peaks in the fragmentation spectrum and decides for itself which of these peaks are regarded as noise Centroided MS data SIRIUS does not contain routines for peak picking from profile mode spectra msConvert in ProteoWizard can be used to convert to centroided data Additionally there are several tools specialized for the preprocessing task such as OpenMS MZmine or XCMS OpenMS 18 and MZmine 3 19 both provide export functions tailored to the needs for SIRIUS Different common MS file formats such as csv ms or mgf files can be imported to SIRIUS SIRIUS can import full LC MS runs mzML or single compounds At present SIRIUS only handles single charged compounds 17 Features editSIRIUS identifies small molecules in a two step approach 17 First the molecular formula of the molecule is determined Second a molecular fingerprint is predicted to search against a structure database to identify the most likely candidate The following algorithms are implemented in SIRIUS SIRIUS Molecular formula identification edit SIRIUS is the name of the umbrella application but for historic reasons also the name for the identification of the molecular formula Molecular formula refers to the elemental composition of the molecule The mere mass of a molecule is not sufficient to determine the correct molecular formula 17 Even with very high mass accuracy many molecular formulas can explain a mass measured in a spectrum in particular in higher mass regions In SIRIUS molecular formula identification is done using isotope pattern analysis on the MS1 data as well as fragmentation tree computation on the MS2 data The score of a molecular formula candidate is a combination of the isotope pattern score and the fragmentation tree score To identify the molecular formula SIRIUS is considering all possible molecular formulas for a set of elements The elements most abundant in living beings are hydrogen H carbon C nitrogen N oxygen O and phosphor P This is the default set of elements in SIRIUS Some less common elements result in very characteristic isotope pattern changes and can be automatically detected 20 Detectable elements are sulfur S chlorine Cl bromine Br boron B and selenium Se The current version of SIRIUS uses a deep neural network for auto detection of elements from the isotope and fragmentation pattern of the query molecule 14 For very large molecules or in case of missing data e g a missing isotope pattern it is possible to restrict SIRIUS to molecular formulas found in a database such as PubChem Decomposition of mass edit In order to quickly generate a manageable number of molecular formula candidates the monoisotopic mass is decomposed into all possible molecular formulas that would lead to this mass There are two definitions of the monoisotopic mass 21 1 the sum of the masses of the most abundant naturally occurring stable isotope of each atom i e the highest peak of the isotope pattern 2 the sum of the masses of the lightest naturally occurring stable isotope of each atom i e the peak of the isotope pattern with the lowest mass For small molecules the lightest peak is also mostly the highest peak of the isotope pattern However in the computational context of SIRIUS the second definition is used Decomposing the monoisotopic mass into all possible molecular formulas requires a mass interval taking into account the measurement inaccuracy of the instrument This real valued decomposition is transformed into a problem instance with integer masses by using a blowup factor The resulting problem is known as Change making problem which is well studied and can be solved in runtime linear in the size of the output 22 Isotope pattern analysis edit Isotope patterns of the candidate molecular formulas are simulated starting with the isotopic distributions of the individual elements and then combining these distributions by folding 23 1 The simulated isotope pattern is compared with the measured pattern by assigning probabilities to the observed masses and intensities 1 Fragmentation tree computation edit A fragmentation tree is a representation of the fragmentation process similar to fragmentation diagrams created by experts The fragmentation tree annotates the MS2 spectrum by providing a molecular formula for each fragment peak Peaks that do not receive an annotation are considered noise peaks The fragmentation tree also predicts the fragmentation reactions called losses leading to the fragment peaks Fragmentation trees are a valuable tool for deducing information about the fragmentation but are not a precise depiction of the actual fragmentation process 7 To identify the molecular formula of an unknown molecule a separate fragmentation tree is computed for every molecular formula candidate In other words the method attempts to reconstruct the fragmentation process that led to this MS2 spectrum for each candidate molecular formula This allows to compare the different hypotheses that a particular candidate is actual the correct molecular formula The best scoring fragmentation tree i e the fragmentation process that is best explaining the spectrum corresponds to the most likely molecular formula explanation ZODIAC Improved molecular formula identification edit ZODIAC improves the ranking of the formula candidates provided by SIRIUS 13 Organisms produce related metabolites derived from multiple but limited biosynthetic pathways For a full LC MS MS run that is derived from a biological sample or any other set of derivatives the relation of the metabolites is reflected in their similarity Those similarities are in turn reflected in joint fragments and losses between the fragmentation trees and can be leveraged to improve molecular formula identification of the individual molecules ZODIAC uses the top X molecular formula candidates for each molecule from SIRIUS to build a similarity network and uses Bayesian statistics to re rank those candidates Prior probabilities are derived from fragmentation tree similarity Finding an optimal solution to the resulting computational problem is NP hard therefore Gibbs sampling is used ZODIAC stands for ZODIAC Organic compound Determination by Integral Assignment of elemental Compositions CSI FingerID Structure database search edit CSI FIngerID identifies the structure of a molecule by predicting its molecular fingerprint and using this fingerprint to search in a molecular structure database 10 Molecular fingerprints edit A molecular fingerprint is a binary vector where each position corresponds to a specific molecular property In this representation a given position X may encode the presence or absence of a particular substructure with 1 indicating presence and 0 indicating absence Various types of molecular fingerprints exist including PubChem CACTVS fingerprints Klekota Roth fingerprints 24 MACCS fingerprints and Extended Connectivity Fingerprints ECFP 25 A molecular fingerprint can be deterministically computed from a given molecular structure Different molecular structures may yield the same molecular fingerprint Predicting molecular fingerprints edit CSI FingerID predicts a probabilistic fingerprint with a variety of molecular properties from several fingerprint types The fingerprint is predicted from the given spectrum and its corresponding fragmentation tree using deep kernel learning 26 10 which is a combination of kernel methods and deep neural networks Not only the top scoring molecular formula but multiple high scoring molecular formula candidates are considered Comparing molecular fingerprints edit To search in a molecular structure database requires a metric to compare and score the molecular fingerprints Tanimoto similarity Jaccard index is a commonly employed metric A similarity value of 1 signifies identical fingerprints while a value of 0 indicates structures that do not share any molecular properties The calculated similarity value depends on the choice of fingerprint type CSI FingerID employs a logarithmic posterior probability to rank the structure candidates where scores are represented as negative numbers and zero is the optimum 27 This scoring function results in a higher number of correct identifications 10 Tanimoto similarities are also given COSMIC Identification confidence edit The COSMIC confidence score assigns a confidence to CSI FingerID structure identifications 16 The idea is similar to False Discovery Rates All molecules in a large dataset are analysed using CSI FingerID the top ranked hit for each molecule will be evaluated by COSMIC and the most trustworthy identifications can be selected for further analysis COSMIC does not re rank structure candidates of a particular molecule nor does it discard any identifications COSMIC employs a confidence score that combines E value estimation and a linear support vector machine SVM with enforced directionality Calibration of CSI FingerID scores is achieved using E value estimates 28 Generating decoys for small molecule structures is a non trivial task that is why candidates in PubChem serve as a proxy for decoys here The score distribution is modeled as a mixture distribution of log normal distributions and the P value and E value of a hit score are estimated using the kernel density estimate of PubChem candidate scores The SVM is employed to classify whether a hit is correct utilizing features such as the calibrated score score differences to other candidates the total peak intensity explained by the fragmentation tree and the cardinality of molecular fingerprints Learning is constrained to a linear SVM to mitigate the risk of overfitting and the directionality of features is enforced This involves making upfront decisions about whether high or low values of a feature should enhance the confidence in an identification For instance a high CSI FingerID score of a hit should increase but never decrease the confidence that the hit is correct Some features necessitate the existence of at least two candidates for comparison and separate SVMs are trained for single instances The decision values of the SVM are mapped to posterior probability estimates using Platt scaling 29 This comprehensive approach ensures a robust and nuanced assessment of the confidence in molecule identifications 16 CANOPUS compound class prediction edit CANOPUS is short for class assignment and ontology prediction using mass spectrometry 15 It predicts the compound classes from the molecular fingerprint predicted by CSI FingerID This approach is completely database free i e it is not even limited to molecules that are listed in structure databases CANOPUS employs a deep neural network DNN 30 to predict 2 497 compound classes The DNN was trained on 4 10 million compound structures with compound classes assigned by ClassyFire 31 No MS MS data was used for training but instead simulated realistic probabilistic fingerprints for the training molecular structures were used The DNN predicts all compound classes simultaneously For full biological datasets CANOPUS provides a comprehensive overview of compound classes present in the sample and allows for comparisons between different cohorts at compound class level Areas of application editSmall molecules are essential components found throughout nature playing a significant role in various fields such as drug discovery diagnostics food science environmental monitoring and more Effectively addressing many global challenges hinges on the comprehensive identification of small molecules in complex samples These complex mixtures contain thousands of different molecules measurable in a single mass spectrometry run The identification of unknown small molecules is considered a critical bottleneck in metabolomics natural product research and related fields given that widely over 90 of all small molecules remain unknown 32 33 Commonly analyses were based on targeted approaches that are limited to the rediscovery of known molecules In contrast untargeted analysis is a top down strategy that avoids the need for a prior specific hypothesis on expected small molecules The focus shifts from asking Is molecule X present in the sample to Which unknown molecules are present in the sample and might be relevant for downstream analysis SIRIUS is designed for the untargeted structural elucidation of unknown molecules addressing various challenges The correct molecular structure is prominently ranked from an extensive list of candidates This can be compared to a Google search where the optimal answer is expected to be among the top three 10 It can be assessed whether the top candidate is indeed correct 16 Structural information is available even for molecules absent in extensive structure databases including details on compound class and substructure information 15 Examples of application edit Neonatal dried blood spots are important for newborn screening and a powerful source for investigating the potential metabolic etiologies of various diseases using untargeted LC MS based metabolomics Researchers used SIRIUS to investigate the stability of metabolites and classes of molecules in neonatal dried blood spot biobanks 34 Marine microorganisms offer a rich source of bioactive compounds with unique structures and remarkable biological activity This makes them an important resource for the search for new therapeutic compounds Researchers are using SIRIUS to narrow down the search to the most promising microorganisms 35 Pediatric asthma poses diagnostic challenges due to its variable presentation Breath analysis could be a game changer in pediatric allergic asthma management By identifying unique exhaled metabolic signatures using SIRIUS researchers developed an approach to diagnose children with allergic asthma 36 Thiacloprid is a first generation widely used neonicotinoid insecticide Its persistence in the environment and potential adverse effects on human health have raised significant concerns Elucidating the impurity profile of pesticides is crucial for assessing their environmental impact and potential risks and setting acceptable limits for impurities Using SIRIUS researchers demonstrated an approach for identifying structurally related impurities in pesticides 37 Under certain conditions two bacterial species can thrive together in a dual species biofilm The cooperation between P aeruginosa and S aureus in cystic fibrosis leads to increased disease severity Using SIRIUS researchers identified a metabolite that could be related to the increased pathogenesis of this dual species biofilm in cystic fibrosis 38 Our skin hosts a diverse community of microorganisms known as the skin microbiota Using SIRIUS researchers identified changes in the skin metabolome that are more pronounced than changes in the microbial composition suggesting that even subtle shifts in microbial abundance can lead to significant effects on the skin 39 Limitations editLimitation of the measurement method edit Mass spectra alone lack sufficient information to unambiguously identify every molecule Some molecules produce almost indistinguishable spectra even more similar than the same molecule measured on two different instruments 21 Extensive follow up experiments are required for unambiguous identification Based thereon it is impossible to always correctly identify a molecular structure merely from a mass spectrum Thus CSI FingerID as well as other methods for structure database search cannot guarantee finding the correct molecular structure as first hit That is why it is important to have the correct structure ranked very high from an extensive list of candidates and to assess the confidence in the top hit Limitation of structure databases edit Structure databases are orders of magnitude larger than spectral libraries but still incomplete 40 It is understood that not every existing biomolecule is or will be contained in structure databases For these instances SIRIUS offers several solutions SIRIUS can search in databases of hypothetical structures 16 This could be for example interesting for finding derivatives The predicted molecular fingerprint offers structural information about e g substructures 10 CANOPUS predicts the compound classes of a molecule without searching in a database 15 Independent evaluation of the software editCASMI Critical Assessment of Small Molecule Identification 41 is an open contest on the identification of small molecules from mass spectrometry data and was launched in 2012 by Emma Schymanski and Steffen Neumann 42 In CASMI 2016 CSI FingerID and a derivative of CSI FingerID in which the Bocker Group was also involved won first and second place in the category Best Automatic Structural Identification In Silico Fragmentation Only Also CSI FingerID had the best result for ranking the correct molecule structure at position one 70 out of 127 positive mode 43 44 In CASMI 2017 SIRIUS plus CSI FingerID won in 3 of 4 categories Best Structure Identification on Natural Products Best Automatic Structural Identification In Silico Fragmentation Only Best Automatic Candidate Ranking 45 In CASMI 2022 six out of 16 contestants used SIRIUS in their workflow to identify the best molecular structure candidates SIRIUS won in the categories Correct elemental formulas Correct compound structure classes and Correct 2D chemical structures CASMI 2022 included compounds that were not even contained in PubChem 46 Awards and recognition editSebastian Bocker s group at FSU Jena won the 2022 Thuringian Research Award in the Applied Research category for SIRIUS and the underlying methods 47 48 SIRIUS was recognized as a method to watch by Nature Methods in 2020 49 Licences editSIRIUS is developed by the group of Sebastian Bocker at the FSU Jena in close collaboration with the Bright Giant GmbH SIRIUS is provided as a software as a service solution The client software is open source and installed on the users computers Molecular formula annotation using fragmentation trees and isotope pattern analysis is performed on your local computer without subscription requirement The SIRIUS web services for structural elucidation including molecular fingerprint prediction structure database search confidence score assessment and compound class prediction require a user account The web services are free for academic non commercial use provided hosted by the FSU Jena Academic institutions are identified by their email domain and access will be granted automatically In some cases further validation might be required Bright Giant GmbH offers subscription based access to the SIRIUS web services for structural elucidation for commercial users Alternatives editOther algorithms and software for searching in structure databases are CFM ID 50 51 ICEBERG 52 MetFrag 53 MS FINDER 54 55 MetaboScape Bruker MassHunter Agilent or Compound Discoverer Thermo Fisher Scientific See also editTandem mass spectrometry Metabolomics List of mass spectrometry softwareReferences edit a b c d Bocker Sebastian Letzel Matthias C Liptak Zsuzsanna Pervukhin Anton 15 January 2009 SIRIUS decomposing isotope patterns for metabolite identification Bioinformatics 25 2 218 224 doi 10 1093 bioinformatics btn603 PMC 2639009 PMID 19015140 a b Bocker Sebastian Rasche Florian 15 August 2008 Towards de novo identification of metabolites by analyzing tandem mass spectra Bioinformatics 24 16 i49 i55 doi 10 1093 bioinformatics btn270 PMID 18689839 Scheubert Kerstin Hufsky Franziska Bocker Sebastian December 2013 Computational mass spectrometry for small molecules Journal of Cheminformatics 5 1 12 doi 10 1186 1758 2946 5 12 PMC 3648359 PMID 23453222 Horai Hisayuki Arita Masanori Kanaya Shigehiko Nihei Yoshito Ikeda Tasuku Suwa Kazuhiro Ojima Yuya Tanaka Kenichi Tanaka Satoshi Aoshima Ken Oda Yoshiya Kakazu Yuji Kusano Miyako Tohge Takayuki Matsuda Fumio Sawada Yuji Hirai Masami Yokota Nakanishi Hiroki Ikeda Kazutaka Akimoto Naoshige Maoka Takashi Takahashi Hiroki Ara Takeshi Sakurai Nozomu Suzuki Hideyuki Shibata Daisuke Neumann Steffen Iida Takashi Tanaka Ken Funatsu Kimito Matsuura Fumito Soga Tomoyoshi Taguchi Ryo Saito Kazuki Nishioka Takaaki July 2010 MassBank a public repository for sharing mass spectral data for life sciences Journal of Mass Spectrometry 45 7 703 714 Bibcode 2010JMSp 45 703H doi 10 1002 jms 1777 PMID 20623627 Smith Colin A Maille Grace O Want Elizabeth J Qin Chuan Trauger Sunia A Brandon Theodore R Custodio Darlene E Abagyan Ruben Siuzdak Gary December 2005 METLIN A Metabolite Mass Spectral Database Therapeutic Drug Monitoring 27 6 747 751 doi 10 1097 01 ftd 0000179845 53213 39 PMID 16404815 S2CID 14774455 Mass Spectrometry Data Center NIST chemdata nist gov a b c Rasche Florian Svatos Ales Maddula Ravi Kumar Bottcher Christoph Bocker Sebastian 15 February 2011 Computing Fragmentation Trees from Tandem Mass Spectrometry Data Analytical Chemistry 83 4 1243 1251 doi 10 1021 ac101825k PMID 21182243 a b Rasche Florian Scheubert Kerstin Hufsky Franziska Zichner Thomas Kai Marco Svatos Ales Bocker Sebastian 3 April 2012 Identifying the Unknowns by Aligning Fragmentation Trees Analytical Chemistry 84 7 3417 3426 doi 10 1021 ac300304u PMID 22390817 Heinonen Markus Shen Huibin Zamboni Nicola Rousu Juho 15 September 2012 Metabolite identification and molecular fingerprint prediction through machine learning Bioinformatics 28 18 2333 2341 doi 10 1093 bioinformatics bts437 hdl 20 500 11850 55584 PMID 22815355 a b c d e f Duhrkop Kai Shen Huibin Meusel Marvin Rousu Juho Bocker Sebastian 13 October 2015 Searching molecular structure databases with tandem mass spectra using CSI FingerID Proceedings of the National Academy of Sciences 112 41 12580 12585 Bibcode 2015PNAS 11212580D doi 10 1073 pnas 1509788112 PMC 4611636 PMID 26392543 Kim Sunghwan Chen Jie Cheng Tiejun Gindulyte Asta He Jia He Siqian Li Qingliang Shoemaker Benjamin A Thiessen Paul A Yu Bo Zaslavsky Leonid Zhang Jian Bolton Evan E 8 January 2021 PubChem in 2021 new data content and improved web interfaces Nucleic Acids Research 49 D1 D1388 D1395 doi 10 1093 nar gkaa971 PMC 7778930 PMID 33151290 2023 Release of the NIST EI and Tandem Libraries PDF National Institute of Standards and Technology NIST Retrieved 12 January 2023 a b Ludwig Marcus Nothias Louis Felix Duhrkop Kai Koester Irina Fleischauer Markus Hoffmann Martin A Petras Daniel Vargas Fernando Morsy Mustafa Aluwihare Lihini Dorrestein Pieter C Bocker Sebastian 13 October 2020 Database independent molecular formula annotation using Gibbs sampling through ZODIAC Nature Machine Intelligence 2 10 629 641 doi 10 1038 s42256 020 00234 6 a b Duhrkop Kai Fleischauer Markus Ludwig Marcus Aksenov Alexander A Melnik Alexey V Meusel Marvin Dorrestein Pieter C Rousu Juho Bocker Sebastian April 2019 SIRIUS 4 a rapid tool for turning tandem mass spectra into metabolite structure information Nature Methods 16 4 299 302 doi 10 1038 s41592 019 0344 8 PMID 30886413 S2CID 81985235 a b c d Duhrkop Kai Nothias Louis Felix Fleischauer Markus Reher Raphael Ludwig Marcus Hoffmann Martin A Petras Daniel Gerwick William H Rousu Juho Dorrestein Pieter C Bocker Sebastian April 2021 Systematic classification of unknown metabolites using high resolution fragmentation mass spectra Nature Biotechnology 39 4 462 471 doi 10 1038 s41587 020 0740 8 PMID 33230292 a b c d e Hoffmann Martin A Nothias Louis Felix Ludwig Marcus Fleischauer Markus Gentry Emily C Witting Michael Dorrestein Pieter C Duhrkop Kai Bocker Sebastian March 2022 High confidence structural annotation of metabolites absent from spectral libraries Nature Biotechnology 40 3 411 421 doi 10 1038 s41587 021 01045 9 PMC 8926923 PMID 34650271 a b c d Ludwig Marcus Fleischauer Markus Duhrkop Kai Hoffmann Martin A Bocker Sebastian 2020 De Novo Molecular Formula Annotation and Structure Elucidation Using SIRIUS 4 Computational Methods and Data Analysis for Metabolomics Methods in Molecular Biology Vol 2104 pp 185 207 doi 10 1007 978 1 0716 0239 3 11 ISBN 978 1 0716 0238 6 PMID 31953819 S2CID 210709539 Rost Hannes L Sachsenberg Timo Aiche Stephan Bielow Chris Weisser Hendrik Aicheler Fabian Andreotti Sandro Ehrlich Hans Christian Gutenbrunner Petra Kenar Erhan Liang Xiao Nahnsen Sven Nilse Lars Pfeuffer Julianus Rosenberger George Rurik Marc Schmitt Uwe Veit Johannes Walzer Mathias Wojnar David Wolski Witold E Schilling Oliver Choudhary Jyoti S Malmstrom Lars Aebersold Ruedi Reinert Knut Kohlbacher Oliver September 2016 OpenMS a flexible open source software platform for mass spectrometry data analysis PDF Nature Methods 13 9 741 748 doi 10 1038 nmeth 3959 PMID 27575624 S2CID 873670 Schmid Robin Heuckeroth Steffen Korf Ansgar Smirnov Aleksandr Myers Owen Dyrlund Thomas S Bushuiev Roman Murray Kevin J Hoffmann Nils Lu Miaoshan Sarvepalli Abinesh Zhang Zheng Fleischauer Markus Duhrkop Kai Wesner Mark Hoogstra Shawn J Rudt Edward Mokshyna Olena Brungs Corinna Ponomarov Kirill Mutabdzija Lana Damiani Tito Pudney Chris J Earll Mark Helmer Patrick O Fallon Timothy R Schulze Tobias Rivas Ubach Albert Bilbao Aivett Richter Henning Nothias Louis Felix Wang Mingxun Oresic Matej Weng Jing Ke Bocker Sebastian Jeibmann Astrid Hayen Heiko Karst Uwe Dorrestein Pieter C Petras Daniel Du Xiuxia Pluskal Tomas April 2023 Integrative analysis of multimodal mass spectrometry data in MZmine 3 Nature Biotechnology 41 4 447 449 doi 10 1038 s41587 023 01690 2 PMC 10496610 PMID 36859716 Meusel Marvin Hufsky Franziska Panter Fabian Krug Daniel Muller Rolf Bocker Sebastian 2 August 2016 Predicting the Presence of Uncommon Elements in Unknown Biomolecules from Isotope Patterns Analytical Chemistry 88 15 7556 7566 doi 10 1021 acs analchem 6b01015 PMID 27398867 a b Bocker Sebastian 29 April 2022 Algorithmic Mass Spectrometry PDF Version 0 8 4 ed Retrieved 12 January 2024 Bocker Sebastian Liptak Zsuzsanna August 2007 A Fast and Simple Algorithm for the Money Changing Problem Algorithmica 48 4 413 432 doi 10 1007 s00453 007 0162 8 S2CID 17652643 Kubinyi Hugo June 1991 Calculation of isotope distributions in mass spectrometry A trivial solution for a non trivial problem Analytica Chimica Acta 247 1 107 119 Bibcode 1991AcAC 247 107K doi 10 1016 S0003 2670 00 83059 7 Klekota Justin Roth Frederick P 1 November 2008 Chemical substructures that enrich for biological activity Bioinformatics 24 21 2518 2525 doi 10 1093 bioinformatics btn479 PMC 2732283 PMID 18784118 Rogers David Hahn Mathew 24 May 2010 Extended Connectivity Fingerprints Journal of Chemical Information and Modeling 50 5 742 754 doi 10 1021 ci100050t PMID 20426451 Duhrkop Kai 24 June 2022 Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra Bioinformatics 38 Supplement 1 i342 i349 doi 10 1093 bioinformatics btac260 PMC 9235503 PMID 35758813 Ludwig Marcus Duhrkop Kai Bocker Sebastian 1 July 2018 Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints Bioinformatics 34 13 i333 i340 doi 10 1093 bioinformatics bty245 PMC 6022630 PMID 29949965 Keich Uri Noble William Stafford 6 February 2015 On the Importance of Well Calibrated Scores for Identifying Shotgun Proteomics Spectra Journal of Proteome Research 14 2 1147 1160 doi 10 1021 pr5010983 PMC 4324453 PMID 25482958 Platt John C 29 September 2000 Probabilities for SV Machines Advances in Large Margin Classifiers 61 74 doi 10 7551 mitpress 1113 003 0008 ISBN 978 0 262 28397 7 LeCun Yann Bengio Yoshua Hinton Geoffrey 28 May 2015 Deep learning PDF Nature 521 7553 436 444 Bibcode 2015Natur 521 436L doi 10 1038 nature14539 PMID 26017442 S2CID 3074096 Djoumbou Feunang Yannick Eisner Roman Knox Craig Chepelev Leonid Hastings Janna Owen Gareth Fahy Eoin Steinbeck Christoph Subramanian Shankar Bolton Evan Greiner Russell Wishart David S December 2016 ClassyFire automated chemical classification with a comprehensive computable taxonomy Journal of Cheminformatics 8 1 61 doi 10 1186 s13321 016 0174 y PMC 5096306 PMID 27867422 da Silva Ricardo R Dorrestein Pieter C Quinn Robert A 13 October 2015 Illuminating the dark matter in metabolomics Proceedings of the National Academy of Sciences 112 41 12549 12550 doi 10 1073 pnas 1516878112 PMC 4611607 PMID 26430243 Hulleman Tobias Turkina Viktoriia O Brien Jake W Chojnacka Aleksandra Thomas Kevin V Samanipour Saer 26 September 2023 Critical Assessment of the Chemical Space Covered by LC HRMS Non Targeted Analysis Environmental Science amp Technology 57 38 14101 14112 Bibcode 2023EnST 5714101H doi 10 1021 acs est 3c03606 PMC 10537454 PMID 37704971 Ottosson Filip Russo Francesco Abrahamsson Anna MacSween Nadia Courraud Julie Nielsen Zaki Krag Hougaard David M Cohen Arieh S Ernst Madeleine 5 April 2023 Effects of Long Term Storage on the Biobanked Neonatal Dried Blood Spot Metabolome Journal of the American Society for Mass Spectrometry 34 4 685 694 doi 10 1021 jasms 2c00358 PMC 10080689 PMID 36913955 Le Loarer Alexandre Marcellin Gros Remy Dufosse Laurent Bignon Jerome Frederich Michel Ledoux Allison Queiroz Emerson Ferreira Wolfender Jean Luc Gauvin Bialecki Anne Fouillaud Mireille 8 March 2023 Prioritization of Microorganisms Isolated from the Indian Ocean Sponge Scopalina hapalia Based on Metabolomic Diversity and Biological Activity for the Discovery of Natural Products Microorganisms 11 3 697 doi 10 3390 microorganisms11030697 PMC 10057949 PMID 36985270 Weber Ronja Streckenbach Bettina Welti Lara Inci Demet Kohler Malcolm Perkins Nathan Zenobi Renato Micic Srdjan Moeller Alexander 31 March 2023 Online breath analysis with SESI HRMS for metabolic signatures in children with allergic asthma Frontiers in Molecular Biosciences 10 doi 10 3389 fmolb 2023 1154536 PMC 10102578 PMID 37065443 Li Xianjiang Tu Mengling Yang Bingxin Ma Wen Li Hongmei October 2023 Structurally related impurity profiling of thiacloprid by orbitrap and de novo identification tool Microchemical Journal 193 109123 doi 10 1016 j microc 2023 109123 S2CID 260123222 Uzi Gavrilov S Tik Z Sabti O Meijler MM 17 July 2023 Chemical Modification of a Bacterial Siderophore by a Competitor in Dual Species Biofilms Angewandte Chemie International ed In English 62 29 e202300585 doi 10 1002 anie 202300585 PMID 37211536 Li Min Mao Junhong Diaz Isabel Kopylova Evguenia Melnik Alexey V Aksenov Alexander A Tipton Craig D Soliman Nadia Morgan Andrea M Boyd Thomas 18 July 2023 Multi omic approach to decipher the impact of skincare products with pre postbiotics on skin microbiome and metabolome Frontiers in Medicine 10 doi 10 3389 fmed 2023 1165980 PMC 10392128 PMID 37534320 Hufsky Franziska Bocker Sebastian September 2017 Mining molecular structure databases Identification of small molecules based on fragmentation mass spectrometry data Mass Spectrometry Reviews 36 5 624 633 Bibcode 2017MSRv 36 624H doi 10 1002 mas 21489 PMID 26763615 Critical Assessment of Small Molecule Identification Retrieved 12 January 2023 Schymanski Emma Neumann Steffen 25 June 2013 The Critical Assessment of Small Molecule Identification CASMI Challenges and Solutions Metabolites 3 3 517 538 doi 10 3390 metabo3030517 PMC 3901296 PMID 24958137 Schymanski Emma L Ruttkies Christoph Krauss Martin Brouard Celine Kind Tobias Duhrkop Kai Allen Felicity Vaniya Arpana Verdegem Dries Bocker Sebastian Rousu Juho Shen Huibin Tsugawa Hiroshi Sajed Tanvir Fiehn Oliver Ghesquiere Bart Neumann Steffen December 2017 Critical Assessment of Small Molecule Identification 2016 automated methods Journal of Cheminformatics 9 1 22 doi 10 1186 s13321 017 0207 1 PMC 5368104 PMID 29086042 CASMI 2016 Results Retrieved 12 January 2023 CASMI 2017 Results Retrieved 12 January 2023 CASMI 2022 Results Retrieved 12 January 2023 Thuringer Forschungspreis 2022 YouTube Thuringer Wirtschafts amp Wissenschaftsministerium Retrieved 12 January 2023 Schonfelder Ute 6 April 2022 Artificial Intelligence identifies small molecules Bioinformatics team awarded 2022 Thuringian Research Prize in the category Applied Research Friedrich Schiller University Jena Retrieved 12 January 2023 Singh Arunima January 2020 Tools for metabolomics Nature Methods 17 1 24 doi 10 1038 s41592 019 0710 6 PMID 31907484 Allen Felicity Pon Allison Wilson Michael Greiner Russ Wishart David 1 July 2014 CFM ID a web server for annotation spectrum prediction and metabolite identification from tandem mass spectra Nucleic Acids Research 42 W1 W94 W99 doi 10 1093 nar gku436 PMC 4086103 PMID 24895432 Wang Fei Allen Dana Tian Siyang Oler Eponine Gautam Vasuk Greiner Russell Metz Thomas O Wishart David S 5 July 2022 CFM ID 4 0 a web server for accurate MS based metabolite identification Nucleic Acids Research 50 W1 W165 W174 doi 10 1093 nar gkac383 PMC 9252813 PMID 35610037 Goldman Samuel Li Janet Coley Connor W 2023 Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks arXiv 2304 13136 q bio QM Ruttkies Christoph Schymanski Emma L Wolf Sebastian Hollender Juliane Neumann Steffen December 2016 MetFrag relaunched incorporating strategies beyond in silico fragmentation Journal of Cheminformatics 8 1 3 doi 10 1186 s13321 016 0115 9 PMC 4732001 PMID 26834843 Tsugawa Hiroshi Kind Tobias Nakabayashi Ryo Yukihira Daichi Tanaka Wataru Cajka Tomas Saito Kazuki Fiehn Oliver Arita Masanori 16 August 2016 Hydrogen Rearrangement Rules Computational MS MS Fragmentation and Structure Elucidation Using MS FINDER Software Analytical Chemistry 88 16 7946 7958 doi 10 1021 acs analchem 6b00770 PMC 7063832 PMID 27419259 Lai Zijuan Tsugawa Hiroshi Wohlgemuth Gert Mehta Sajjan Mueller Matthew Zheng Yuxuan Ogiwara Atsushi Meissen John Showalter Megan Takeuchi Kohei Kind Tobias Beal Peter Arita Masanori Fiehn Oliver January 2018 Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics Nature Methods 15 1 53 56 doi 10 1038 nmeth 4512 PMC 6358022 PMID 29176591 This article needs additional or more specific categories Please help out by adding categories to it so that it can be listed with similar articles January 2024 Retrieved from https en wikipedia org w index php title SIRIUS software amp oldid 1212125897, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.