fbpx
Wikipedia

SMILES arbitrary target specification

SMILES arbitrary target specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.

SMARTS is related to the SMILES line notation that is used to encode molecular structures and like SMILES was originally developed by David Weininger and colleagues at Daylight Chemical Information Systems. The most comprehensive descriptions of the SMARTS language can be found in Daylight's SMARTS theory manual,[1] tutorial [2] and examples.[3] OpenEye Scientific Software has developed their own version of SMARTS which differs from the original Daylight version in how the R descriptor (see cyclicity below) is defined.

SMARTS Syntax

Atomic properties

Atoms can be specified by symbol or atomic number. Aliphatic carbon is matched by [C], aromatic carbon by [c] and any carbon by [#6] or [C,c]. The wild card symbols *, A and a match any atom, any aliphatic atom and any aromatic atom respectively. Implicit hydrogens are considered to be a characteristic of atoms and the SMARTS for an amino group can be written as [NH2]. Charge is specified by the descriptors + and - as exemplified by the SMARTS [nH+] (protonated aromatic nitrogen atom) and [O-]C(=O)c (deprotonated aromatic carboxylic acid).

Bonds

A number of bond types can be specified: - (single), = (double), # (triple), : (aromatic) and ~ (any).

Connectivity

The X and D descriptors are used to specify the total numbers of connections (including implicit hydrogen atoms) and connections to explicit atoms. Thus [CX4] matches carbon atoms with bonds to any four other atoms while [CD4] matches quaternary carbon.

Cyclicity

As originally defined by Daylight, the R descriptor is used to specify ring membership. In the Daylight model for cyclic systems, the smallest set of smallest rings (SSSR)[4] is used as a basis for ring membership. For example, indole is perceived as a 5-membered ring fused with a 6-membered ring rather than a 9-membered ring. The two carbon atoms that make up the ring fusion would match [cR2] and the other carbon atoms would match [cR1].

The SSSR model has been criticised by OpenEye[5] who, in their implementation of SMARTS, use R to denote the number of ring bonds for an atom. The two carbon atoms in the ring fusion match [cR3] and the other carbons match [cR2] in the OpenEye implementation of SMARTS. Used without a number, R specifies an atom in a ring in both implementations, for example [CR] (aliphatic carbon atom in ring).

Lower case r specifies the size of the smallest ring of which the atom is a member. The carbon atoms of the ring fusion would both match [cr5]. Bonds can be specified as cyclic, for example C@C matches directly bonded atoms in a ring.

Logical operators

Four logical operators allow atom and bond descriptors to be combined. The 'and' operator ; can be used to define a protonated primary amine as [N;H3;+][C;X4]. The 'or' operator , has a higher priority so [c,n;H] defines (aromatic carbon or aromatic nitrogen) with implicit hydrogen. The 'and' operator & has higher priority than , so [c,n&H] defines aromatic carbon or (aromatic nitrogen with implicit hydrogen).

The 'not' operator ! can be used to define unsaturated aliphatic carbon as [C;!X4] and acyclic bonds as *-!@*.

Recursive SMARTS

Recursive SMARTS allow detailed specification of an atom's environment. For example, the more reactive (with respect to electrophilic aromatic substitution) ortho and para carbon atoms of phenol can be defined as [$(c1c([OH])cccc1),$(c1ccc([OH])cc1)].

Examples of SMARTS

A number of illustrative examples of SMARTS have been assembled by Daylight.

The definitions of hydrogen bond donors and acceptors used to apply Lipinski's Rule of Five.[6] are easily coded in SMARTS. Donors are defined as nitrogen or oxygen atoms that have at least one directly bonded hydrogen atom:

[N,n,O;!H0] or [#7,#8;!H0] (aromatic oxygen cannot have a bonded hydrogen)

Acceptors are defined as nitrogen or oxygen:

[N,n,O,o] or [#7,#8]

A simple definition of aliphatic amines that are likely to protonate at physiological pH can be written as the following recursive SMARTS:

[$([NH2][CX4]),$([NH]([CX4])[CX4]),$([NX3]([CX4])([CX4])[CX4])]

In real applications the CX4 atoms would need to be defined more precisely to prevent matching against electron withdrawing groups such as CF3 that would render the amine insufficiently basic to protonate at physiological pH.

SMARTS can be used to encode pharmacophore elements such as anionic centers. In the following example, recursive SMARTS notation is used to combine acid oxygen and tetrazole nitrogen in a definition of oxygen atoms that are likely to be anionic under normal physiological conditions.

[$([OH][C,S,P]=O),$([nH]1nnnc1)]

The SMARTS above would only match the acid hydroxyl and the tetrazole N−H. When a carboxylic acid deprotonates the negative charge is delocalised over both oxygen atoms and it may be desirable to designate both as anionic. This can achieved using the following SMARTS.

[$([OH])C=O),$(O=C[OH])]

Applications of SMARTS

The precise and transparent substructural specification that SMARTS allows has been exploited in a number of applications.

Substructural filters defined in SMARTS have been used [7] to identify undesirable compounds when performing strategic pooling of compounds for high-throughput screening. The REOS (rapid elimination of swill) [8] procedure uses SMARTS to filter out reactive, toxic and otherwise undesirable moieties from databases of chemical structures.

RECAP [9](Retrosynthetic Combinatorial Analysis Procedure) uses SMARTS to define bond types. RECAP is a molecule editor which generates fragments of structures by breaking bonds of defined types and the original link points in these are specified using isotopic labels. Searching databases of biologically active compounds for occurrences of fragments allows privileged structural motifs to be identified. The Molecular Slicer [10] is similar to RECAP and has been used to identify fragments that are commonly found in marketed oral drugs.

The Leatherface program[11] is a general purpose molecule editor which allows automated modification of a number of substructural features of molecules in databases, including protonation state, hydrogen count, formal charge, isotopic weight and bond order. The molecular editing rules used by Leatherface are defined in SMARTS. Leatherface can be used to standardise tautomeric and ionization states and to set and enumerate these in preparation of databases[12] for virtual screening. Leatherface has been used in Matched molecular pair analysis, which enables the effects of structural changes (e.g. substitution of hydrogen with chlorine) to be quantified,[13] over a range of structural types.

ALADDIN[14] is a pharmacophore matching program that uses SMARTS to define recognition points (e.g. neutral hydrogen bond acceptor) of pharmacophores. A key problem in pharmacophore matching is that functional groups that are likely to be ionised at physiological pH are typically registered in their neutral forms in structural databases. The ROCS shape matching program allows atom types to be defined using SMARTS.[15]

Notes and references

  1. ^ SMARTS Theory Manual, Daylight Chemical Information Systems, Santa Fe, New Mexico
  2. ^ SMARTS Tutorial, Daylight Chemical Information Systems, Santa Fe, New Mexico
  3. ^ SMARTS Examples, Daylight Chemical Information Systems, Santa Fe, New Mexico.
  4. ^ Downs, G.M.; Gillet, V.J.; Holliday, J.D.; Lynch, M.F. (1989). "A Review of Ring Perception Algorithms for Chemical Graphs". J. Chem. Inf. Comput. Sci. 29 (3): 172–187. doi:10.1021/ci00063a007.
  5. ^ . Archived from the original on October 14, 2007. Retrieved 2017-02-08.{{cite web}}: CS1 maint: bot: original URL status unknown (link), OEChem - C++ Manual, Version 1.5.1, OpenEye Scientific Software, Santa Fe, New Mexico
  6. ^ Lipinski, Christopher A.; Lombardo, Franco; Dominy, Beryl W.; Feeney, Paul J. (2001). "Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings". Advanced Drug Delivery Reviews. 46 (1–3): 3–26. doi:10.1016/S0169-409X(00)00129-0. PMID 11259830.
  7. ^ Hann, Mike; Hudson, Brian; Lewell, Xiao; Lifely, Rob; Miller, Luke; Ramsden, Nigel (1999). "Strategic Pooling of Compounds for High-Throughput Screening". Journal of Chemical Information and Computer Sciences. 39 (5): 897–902. doi:10.1021/ci990423o. PMID 10529988.
  8. ^ Walters, W.Patrick; Murcko, Mark A. (2002). "Prediction of 'drug-likeness'". Advanced Drug Delivery Reviews. 54 (3): 255–271. doi:10.1016/S0169-409X(02)00003-0. PMID 11922947.
  9. ^ Lewell, Xiao Qing; Judd, Duncan B.; Watson, Stephen P.; Hann, Michael M. (1998). "RECAPRetrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry". Journal of Chemical Information and Computer Sciences. 38 (3): 511–522. doi:10.1021/ci970429i. PMID 9611787.
  10. ^ Vieth, Michal; Siegel, Miles G.; Higgs, Richard E.; Watson, Ian A.; Robertson, Daniel H.; Savin, Kenneth A.; Durst, Gregory L.; Hipskind, Philip A. (2004). "Characteristic Physical Properties and Structural Fragments of Marketed Oral Drugs". Journal of Medicinal Chemistry. 47 (1): 224–232. doi:10.1021/jm030267j. PMID 14695836.
  11. ^ Kenny, Peter W.; Sadowski, Jens (2005). "Structure Modification in Chemical Databases". Chemoinformatics in Drug Discovery. Methods and Principles in Medicinal Chemistry. pp. 271–285. doi:10.1002/3527603743.ch11. ISBN 9783527307531.
  12. ^ Lyne, Paul D.; Kenny, Peter W.; Cosgrove, David A.; Deng, Chun; Zabludoff, Sonya; Wendoloski, John J.; Ashwell, Susan (2004). "Identification of Compounds with Nanomolar Binding Affinity for Checkpoint Kinase-1 Using Knowledge-Based Virtual Screening". Journal of Medicinal Chemistry. 47 (8): 1962–1968. doi:10.1021/jm030504i. PMID 15055996.
  13. ^ Leach, Andrew G.; Jones, Huw D.; Cosgrove, David A.; Kenny, Peter W.; Ruston, Linette; MacFaul, Philip; Wood, J. Matthew; Colclough, Nicola; Law, Brian (2006). "Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure". Journal of Medicinal Chemistry. 49 (23): 6672–6682. doi:10.1021/jm0605233. PMID 17154498.
  14. ^ Van Drie, John H.; Weininger, David; Martin, Yvonne C. (1989). "ALADDIN: An integrated tool for computer-assisted molecular design and pharmacophore recognition from geometric, steric, and substructure searching of three-dimensional molecular structures". Journal of Computer-Aided Molecular Design. 3 (3): 225–251. doi:10.1007/BF01533070. PMID 2573695. S2CID 206795998.
  15. ^ OpenEye Scientific Software | ROCS

smiles, arbitrary, target, specification, smarts, language, specifying, substructural, patterns, molecules, smarts, line, notation, expressive, allows, extremely, precise, transparent, substructural, specification, atom, typing, smarts, related, smiles, line, . SMILES arbitrary target specification SMARTS is a language for specifying substructural patterns in molecules The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing SMARTS is related to the SMILES line notation that is used to encode molecular structures and like SMILES was originally developed by David Weininger and colleagues at Daylight Chemical Information Systems The most comprehensive descriptions of the SMARTS language can be found in Daylight s SMARTS theory manual 1 tutorial 2 and examples 3 OpenEye Scientific Software has developed their own version of SMARTS which differs from the original Daylight version in how the R descriptor see cyclicity below is defined Contents 1 SMARTS Syntax 1 1 Atomic properties 1 2 Bonds 1 3 Connectivity 1 4 Cyclicity 1 5 Logical operators 1 6 Recursive SMARTS 2 Examples of SMARTS 3 Applications of SMARTS 4 Notes and referencesSMARTS Syntax EditAtomic properties Edit Atoms can be specified by symbol or atomic number Aliphatic carbon is matched by C aromatic carbon by c and any carbon by 6 or C c The wild card symbols A and a match any atom any aliphatic atom and any aromatic atom respectively Implicit hydrogens are considered to be a characteristic of atoms and the SMARTS for an amino group can be written as NH2 Charge is specified by the descriptors and as exemplified by the SMARTS nH protonated aromatic nitrogen atom and O C O c deprotonated aromatic carboxylic acid Bonds Edit A number of bond types can be specified single double triple aromatic and any Connectivity Edit The X and D descriptors are used to specify the total numbers of connections including implicit hydrogen atoms and connections to explicit atoms Thus CX4 matches carbon atoms with bonds to any four other atoms while CD4 matches quaternary carbon Cyclicity Edit As originally defined by Daylight the R descriptor is used to specify ring membership In the Daylight model for cyclic systems the smallest set of smallest rings SSSR 4 is used as a basis for ring membership For example indole is perceived as a 5 membered ring fused with a 6 membered ring rather than a 9 membered ring The two carbon atoms that make up the ring fusion would match cR2 and the other carbon atoms would match cR1 The SSSR model has been criticised by OpenEye 5 who in their implementation of SMARTS use R to denote the number of ring bonds for an atom The two carbon atoms in the ring fusion match cR3 and the other carbons match cR2 in the OpenEye implementation of SMARTS Used without a number R specifies an atom in a ring in both implementations for example CR aliphatic carbon atom in ring Lower case r specifies the size of the smallest ring of which the atom is a member The carbon atoms of the ring fusion would both match cr5 Bonds can be specified as cyclic for example C C matches directly bonded atoms in a ring Logical operators Edit Four logical operators allow atom and bond descriptors to be combined The and operator can be used to define a protonated primary amine as N H3 C X4 The or operator has a higher priority so c n H defines aromatic carbon or aromatic nitrogen with implicit hydrogen The and operator amp has higher priority than so c n amp H defines aromatic carbon or aromatic nitrogen with implicit hydrogen The not operator can be used to define unsaturated aliphatic carbon as C X4 and acyclic bonds as Recursive SMARTS Edit Recursive SMARTS allow detailed specification of an atom s environment For example the more reactive with respect to electrophilic aromatic substitution ortho and para carbon atoms of phenol can be defined as c1c OH cccc1 c1ccc OH cc1 Examples of SMARTS EditA number of illustrative examples of SMARTS have been assembled by Daylight The definitions of hydrogen bond donors and acceptors used to apply Lipinski s Rule of Five 6 are easily coded in SMARTS Donors are defined as nitrogen or oxygen atoms that have at least one directly bonded hydrogen atom N n O H0 or 7 8 H0 aromatic oxygen cannot have a bonded hydrogen Acceptors are defined as nitrogen or oxygen N n O o or 7 8 A simple definition of aliphatic amines that are likely to protonate at physiological pH can be written as the following recursive SMARTS NH2 CX4 NH CX4 CX4 NX3 CX4 CX4 CX4 In real applications the CX4 atoms would need to be defined more precisely to prevent matching against electron withdrawing groups such as CF3 that would render the amine insufficiently basic to protonate at physiological pH SMARTS can be used to encode pharmacophore elements such as anionic centers In the following example recursive SMARTS notation is used to combine acid oxygen and tetrazole nitrogen in a definition of oxygen atoms that are likely to be anionic under normal physiological conditions OH C S P O nH 1nnnc1 The SMARTS above would only match the acid hydroxyl and the tetrazole N H When a carboxylic acid deprotonates the negative charge is delocalised over both oxygen atoms and it may be desirable to designate both as anionic This can achieved using the following SMARTS OH C O O C OH Applications of SMARTS EditThe precise and transparent substructural specification that SMARTS allows has been exploited in a number of applications Substructural filters defined in SMARTS have been used 7 to identify undesirable compounds when performing strategic pooling of compounds for high throughput screening The REOS rapid elimination of swill 8 procedure uses SMARTS to filter out reactive toxic and otherwise undesirable moieties from databases of chemical structures RECAP 9 Retrosynthetic Combinatorial Analysis Procedure uses SMARTS to define bond types RECAP is a molecule editor which generates fragments of structures by breaking bonds of defined types and the original link points in these are specified using isotopic labels Searching databases of biologically active compounds for occurrences of fragments allows privileged structural motifs to be identified The Molecular Slicer 10 is similar to RECAP and has been used to identify fragments that are commonly found in marketed oral drugs The Leatherface program 11 is a general purpose molecule editor which allows automated modification of a number of substructural features of molecules in databases including protonation state hydrogen count formal charge isotopic weight and bond order The molecular editing rules used by Leatherface are defined in SMARTS Leatherface can be used to standardise tautomeric and ionization states and to set and enumerate these in preparation of databases 12 for virtual screening Leatherface has been used in Matched molecular pair analysis which enables the effects of structural changes e g substitution of hydrogen with chlorine to be quantified 13 over a range of structural types ALADDIN 14 is a pharmacophore matching program that uses SMARTS to define recognition points e g neutral hydrogen bond acceptor of pharmacophores A key problem in pharmacophore matching is that functional groups that are likely to be ionised at physiological pH are typically registered in their neutral forms in structural databases The ROCS shape matching program allows atom types to be defined using SMARTS 15 Notes and references Edit SMARTS Theory Manual Daylight Chemical Information Systems Santa Fe New Mexico SMARTS Tutorial Daylight Chemical Information Systems Santa Fe New Mexico SMARTS Examples Daylight Chemical Information Systems Santa Fe New Mexico Downs G M Gillet V J Holliday J D Lynch M F 1989 A Review of Ring Perception Algorithms for Chemical Graphs J Chem Inf Comput Sci 29 3 172 187 doi 10 1021 ci00063a007 Smallest Set of Smallest Rings SSSR considered Harmful Archived from the original on October 14 2007 Retrieved 2017 02 08 a href Template Cite web html title Template Cite web cite web a CS1 maint bot original URL status unknown link OEChem C Manual Version 1 5 1 OpenEye Scientific Software Santa Fe New Mexico Lipinski Christopher A Lombardo Franco Dominy Beryl W Feeney Paul J 2001 Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings Advanced Drug Delivery Reviews 46 1 3 3 26 doi 10 1016 S0169 409X 00 00129 0 PMID 11259830 Hann Mike Hudson Brian Lewell Xiao Lifely Rob Miller Luke Ramsden Nigel 1999 Strategic Pooling of Compounds for High Throughput Screening Journal of Chemical Information and Computer Sciences 39 5 897 902 doi 10 1021 ci990423o PMID 10529988 Walters W Patrick Murcko Mark A 2002 Prediction of drug likeness Advanced Drug Delivery Reviews 54 3 255 271 doi 10 1016 S0169 409X 02 00003 0 PMID 11922947 Lewell Xiao Qing Judd Duncan B Watson Stephen P Hann Michael M 1998 RECAPRetrosynthetic Combinatorial Analysis Procedure A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry Journal of Chemical Information and Computer Sciences 38 3 511 522 doi 10 1021 ci970429i PMID 9611787 Vieth Michal Siegel Miles G Higgs Richard E Watson Ian A Robertson Daniel H Savin Kenneth A Durst Gregory L Hipskind Philip A 2004 Characteristic Physical Properties and Structural Fragments of Marketed Oral Drugs Journal of Medicinal Chemistry 47 1 224 232 doi 10 1021 jm030267j PMID 14695836 Kenny Peter W Sadowski Jens 2005 Structure Modification in Chemical Databases Chemoinformatics in Drug Discovery Methods and Principles in Medicinal Chemistry pp 271 285 doi 10 1002 3527603743 ch11 ISBN 9783527307531 Lyne Paul D Kenny Peter W Cosgrove David A Deng Chun Zabludoff Sonya Wendoloski John J Ashwell Susan 2004 Identification of Compounds with Nanomolar Binding Affinity for Checkpoint Kinase 1 Using Knowledge Based Virtual Screening Journal of Medicinal Chemistry 47 8 1962 1968 doi 10 1021 jm030504i PMID 15055996 Leach Andrew G Jones Huw D Cosgrove David A Kenny Peter W Ruston Linette MacFaul Philip Wood J Matthew Colclough Nicola Law Brian 2006 Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties a Study of Aqueous Solubility Plasma Protein Binding and Oral Exposure Journal of Medicinal Chemistry 49 23 6672 6682 doi 10 1021 jm0605233 PMID 17154498 Van Drie John H Weininger David Martin Yvonne C 1989 ALADDIN An integrated tool for computer assisted molecular design and pharmacophore recognition from geometric steric and substructure searching of three dimensional molecular structures Journal of Computer Aided Molecular Design 3 3 225 251 doi 10 1007 BF01533070 PMID 2573695 S2CID 206795998 OpenEye Scientific Software ROCS Retrieved from https en wikipedia org w index php title SMILES arbitrary target specification amp oldid 1046653378, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.