fbpx
Wikipedia

Ontology learning

Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical[1] or symbolic[2][3] techniques are used to extract relation signatures, often based on pattern-based[4] or definition-based[5] hypernym extraction techniques.

Procedure edit

Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text.[6][7] The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system.

Domain terminology extraction edit

During the domain terminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures.

Concept discovery edit

In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step.

Concept hierarchy derivation edit

In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage.

Learning of non-taxonomic relations edit

In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy.

Rule discovery edit

During rule discovery,[8] axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist.

Ontology population edit

At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application of bootstrapping methods, which collect relation tuples.

Concept hierarchy extension edit

In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures.

Frame and Event detection edit

During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM with kernel methods to semantic role labeling (SRL)[9] to deep semantic parsing techniques.[10]

Tools edit

Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.[11]

See also edit

Bibliography edit

  • P. Buitelaar, P. Cimiano (Eds.). , Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2008.
  • P. Buitelaar, P. Cimiano, and B. Magnini (Eds.). Ontology Learning from Text: Methods, Evaluation and Applications, Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2005.
  • Wong, W. (2009), "Learning Lightweight Ontologies from Text across Different Domains using the Web as Background Knowledge[permanent dead link]". Doctor of Philosophy thesis, University of Western Australia.
  • Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
  • Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011. doi:10.1145/2166896.2166926

References edit

  1. ^ A. Maedche and S.Staab. Learning ontologies for the semantic web.In Semantic Web Worskhop 2001.
  2. ^ Roberto Navigli and Paola Velardi. Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites, Computational Linguistics,30(2), MIT Press, 2004, pp.151-179.
  3. ^ P.Velardi, S.Faralli, R.Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press,2013, pp.665-707.
  4. ^ Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fourteenth International Conference on Computational Linguistics, pages 539--545, Nantes, France, July 1992.
  5. ^ R.Navigli, P. Velardi. Learning Word-Class Lattices for Definition and Hypernym Extraction.Proc.of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, July 11–16, 2010, pp.1318-1327.
  6. ^ Cimiano, Philipp; Völker, Johanna; Studer, Rudi (2006). "Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text", Information, Wissenschaft und Praxis, 57, p. 315 - 320, http://people.aifb.kit.edu/pci/Publications/iwp06.pdf%5B%5D (retrieved: 18.06.2012).
  7. ^ Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
  8. ^ Johanna Völker; Pascal Hitzler; Cimiano, Philipp (2007). "Acquisition of OWL DL Axioms from Lexical Resources", Proceedings of the 4th European conference on The Semantic Web, p. 670 - 685, http://smartweb.dfki.de/Vortraege/lexo_2007.pdf (retrieved: 18.06.2012).
  9. ^ Coppola B.; Gangemi A.; Gliozzo A.; Picca D.; Presutti V. (2009). "Frame Detection over the Semantic Web", Proceedings of the European Semantic Web Conference (ESWC2009), Springer, 2009.
  10. ^ Presutti V.; Draicchio F.; Gangemi A. (2009). "Knowledge extraction based on Discourse Representation Theory and Linguistic Frames", Proceedings of the Conference on Knowledge Engineering and Knowledge Management (EKAW2012), LNCS, Springer, 2012.
  11. ^ Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011. doi:10.1145/2166896.2166926 http://www.biotec.tu-dresden.de/research/schroeder/dog4dag/


ontology, learning, ontology, extraction, ontology, generation, ontology, acquisition, automatic, semi, automatic, creation, ontologies, including, extracting, corresponding, domain, terms, relationships, between, concepts, that, these, terms, represent, from,. Ontology learning ontology extraction ontology generation or ontology acquisition is the automatic or semi automatic creation of ontologies including extracting the corresponding domain s terms and the relationships between the concepts that these terms represent from a corpus of natural language text and encoding them with an ontology language for easy retrieval As building ontologies manually is extremely labor intensive and time consuming there is great motivation to automate the process Typically the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part of speech tagging and phrase chunking Then statistical 1 or symbolic 2 3 techniques are used to extract relation signatures often based on pattern based 4 or definition based 5 hypernym extraction techniques Contents 1 Procedure 1 1 Domain terminology extraction 1 2 Concept discovery 1 3 Concept hierarchy derivation 1 4 Learning of non taxonomic relations 1 5 Rule discovery 1 6 Ontology population 1 7 Concept hierarchy extension 1 8 Frame and Event detection 2 Tools 3 See also 4 Bibliography 5 ReferencesProcedure editOntology learning OL is used to semi automatically extract whole ontologies from natural language text 6 7 The process is usually split into the following eight tasks which are not all necessarily applied in every ontology learning system Domain terminology extraction edit During the domain terminology extraction step domain specific terms are extracted which are used in the following step concept discovery to derive concepts Relevant terms can be determined e g by calculation of the TF IDF values or by application of the C value NC value method The resulting list of terms has to be filtered by a domain expert In the subsequent step similarly to coreference resolution in information extraction the OL system determines synonyms because they share the same meaning and therefore correspond to the same concept The most common methods therefore are clustering and the application of statistical similarity measures Concept discovery edit In the concept discovery step terms are grouped to meaning bearing units which correspond to an abstraction of the world and therefore to concepts The grouped terms are these domain specific terms and their synonyms which were identified in the domain terminology extraction step Concept hierarchy derivation edit In the concept hierarchy derivation step the OL system tries to arrange the extracted concepts in a taxonomic structure This is mostly achieved with unsupervised hierarchical clustering methods Because the result of such methods is often noisy a supervision step e g user evaluation is added A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub or supersumption relationship Patterns like X that is a Y or X is a Y indicate that X is a subclass of Y Such pattern can be analyzed efficiently but they often occur too infrequently to extract enough sub or supersumption relationships Instead bootstrapping methods are developed which learn these patterns automatically and therefore ensure broader coverage Learning of non taxonomic relations edit In the learning of non taxonomic relations step relationships are extracted that do not express any sub or supersumption Such relationships are e g works for or located in There are two common approaches to solve this subtask The first is based upon the extraction of anonymous associations which are named appropriately in a second step The second approach extracts verbs which indicate a relationship between entities represented by the surrounding words The result of both approaches need to be evaluated by an ontologist to ensure accuracy Rule discovery edit During rule discovery 8 axioms formal description of concepts are generated for the extracted concepts This can be achieved e g by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree The result of this process is a list of axioms which afterwards is comprehended to a concept description This output is then evaluated by an ontologist Ontology population edit At this step the ontology is augmented with instances of concepts and properties For the augmentation with instances of concepts methods based on the matching of lexico syntactic patterns are used Instances of properties are added through the application of bootstrapping methods which collect relation tuples Concept hierarchy extension edit In this step the OL system tries to extend the taxonomic structure of an existing ontology with further concepts This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures Frame and Event detection edit During frame event detection the OL system tries to extract complex relationships from text e g who departed from where to what place and when Approaches range from applying SVM with kernel methods to semantic role labeling SRL 9 to deep semantic parsing techniques 10 Tools editDog4Dag Dresden Ontology Generator for Directed Acyclic Graphs is an ontology generation plugin for Protege 4 1 and OBOEdit 2 1 It allows for term generation sibling generation definition generation and relationship induction Integrated into Protege 4 1 and OBO Edit 2 1 DOG4DAG allows ontology extension for all common ontology formats e g OWL and OBO Limited largely to EBI and Bio Portal lookup service extensions 11 See also editAutomatic taxonomy construction Computational linguistics Domain ontology Information extraction Natural language understanding Semantic Web Text miningBibliography editP Buitelaar P Cimiano Eds Ontology Learning and Population Bridging the Gap between Text and Knowledge Series information for Frontiers in Artificial Intelligence and Applications IOS Press 2008 P Buitelaar P Cimiano and B Magnini Eds Ontology Learning from Text Methods Evaluation and Applications Series information for Frontiers in Artificial Intelligence and Applications IOS Press 2005 Wong W 2009 Learning Lightweight Ontologies from Text across Different Domains using the Web as Background Knowledge permanent dead link Doctor of Philosophy thesis University of Western Australia Wong W Liu W amp Bennamoun M 2012 Ontology Learning from Text A Look back and into the Future ACM Computing Surveys Volume 44 Issue 4 Pages 20 1 20 36 Thomas Wachter Gotz Fabian Michael Schroeder DOG4DAG semi automated ontology generation in OBO Edit and Protege SWAT4LS London 2011 doi 10 1145 2166896 2166926References edit A Maedche and S Staab Learning ontologies for the semantic web In Semantic Web Worskhop 2001 Roberto Navigli and Paola Velardi Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites Computational Linguistics 30 2 MIT Press 2004 pp 151 179 P Velardi S Faralli R Navigli OntoLearn Reloaded A Graph based Algorithm for Taxonomy Induction Computational Linguistics 39 3 MIT Press 2013 pp 665 707 Marti A Hearst Automatic acquisition of hyponyms from large text corpora In Proceedings of the Fourteenth International Conference on Computational Linguistics pages 539 545 Nantes France July 1992 R Navigli P Velardi Learning Word Class Lattices for Definition and Hypernym Extraction Proc of the 48th Annual Meeting of the Association for Computational Linguistics ACL 2010 Uppsala Sweden July 11 16 2010 pp 1318 1327 Cimiano Philipp Volker Johanna Studer Rudi 2006 Ontologies on Demand A Description of the State of the Art Applications Challenges and Trends for Ontology Learning from Text Information Wissenschaft und Praxis 57 p 315 320 http people aifb kit edu pci Publications iwp06 pdf 5B 5D retrieved 18 06 2012 Wong W Liu W amp Bennamoun M 2012 Ontology Learning from Text A Look back and into the Future ACM Computing Surveys Volume 44 Issue 4 Pages 20 1 20 36 Johanna Volker Pascal Hitzler Cimiano Philipp 2007 Acquisition of OWL DL Axioms from Lexical Resources Proceedings of the 4th European conference on The Semantic Web p 670 685 http smartweb dfki de Vortraege lexo 2007 pdf retrieved 18 06 2012 Coppola B Gangemi A Gliozzo A Picca D Presutti V 2009 Frame Detection over the Semantic Web Proceedings of the European Semantic Web Conference ESWC2009 Springer 2009 Presutti V Draicchio F Gangemi A 2009 Knowledge extraction based on Discourse Representation Theory and Linguistic Frames Proceedings of the Conference on Knowledge Engineering and Knowledge Management EKAW2012 LNCS Springer 2012 Thomas Wachter Gotz Fabian Michael Schroeder DOG4DAG semi automated ontology generation in OBO Edit and Protege SWAT4LS London 2011 doi 10 1145 2166896 2166926 http www biotec tu dresden de research schroeder dog4dag Retrieved from https en wikipedia org w index php title Ontology learning amp oldid 1191844377, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.