fbpx
Wikipedia

Automatic taxonomy construction

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a corpus. ATC is a branch of natural language processing, which in turn is a branch of artificial intelligence.

A taxonomy (or taxonomical classification) is a scheme of classification, especially, a hierarchical classification, in which things are organized into groups or types.[1][2][3][4][5][6] Among other things, a taxonomy can be used to organize and index knowledge (stored as documents, articles, videos, etc.), such as in the form of a library classification system, or a search engine taxonomy, so that users can more easily find the information they are searching for. Many taxonomies are hierarchies (and thus, have an intrinsic tree structure), but not all are.

Manually developing and maintaining a taxonomy is a labor-intensive task requiring significant time and resources, including familiarity of or expertise in the taxonomy's domain (scope, subject, or field), which drives the costs and limits the scope of such projects. Also, domain modelers have their own points of view which inevitably, even if unintentionally, work their way into the taxonomy. ATC uses artificial intelligence techniques to quickly automatically generate a taxonomy for a domain in order to avoid these problems and remove limitations.

Approaches edit

There are several approaches to ATC. One approach is to use rules to detect patterns in the corpus and use those patterns to infer relations such as hyponymy. Other approaches use machine learning techniques such as Bayesian inferencing and Artificial Neural Networks.[7]

Keyword extraction edit

One approach to building a taxonomy is to automatically gather the keywords from a domain using keyword extraction, then analyze the relationships between them (see Hyponymy, below), and then arrange them as a taxonomy based on those relationships.

Hyponymy and "is-a" relations edit

In ATC programs, one of the most important tasks is the discovery of hypernym and hyponym relations among words. One way to do that from a body of text is to search for certain phrases like "is a" and "such as".

In linguistics, is-a relations are called hyponymy. Words that describe categories are called hypernyms and words that are examples of categories are hyponyms. For example, dog is a hypernym and Fido is one of its hyponyms. A word can be both a hyponym and a hypernym. So, dog is a hyponym of mammal and also a hypernym of Fido.

Taxonomies are often represented as is-a hierarchies where each level is more specific than (in mathematical language "a subset of") the level above it. For example, a basic biology taxonomy would have concepts such as mammal, which is a subset of animal, and dogs and cats, which are subsets of mammal. This kind of taxonomy is called an is-a model because the specific objects are considered instances of a concept. For example, Fido is-a instance of the concept dog and Fluffy is-a cat.[8]

Applications edit

ATC can be used to build taxonomies for search engines, to improve search results.

ATC systems are a key component of ontology learning (also known as automatic ontology construction), and have been used to automatically generate large ontologies for domains such as insurance and finance. They have also been used to enhance existing large networks such as Wordnet to make them more complete and consistent.[9][10][11]

ATC software edit

Other names edit

Other names for automatic taxonomy construction include:

  • Automated outline building
  • Automated outline construction
  • Automated outline creation
  • Automated outline extraction
  • Automated outline generation
  • Automated outline induction
  • Automated outline learning
  • Automated outlining
  • Automated taxonomy building
  • Automated taxonomy construction
  • Automated taxonomy creation
  • Automated taxonomy extraction
  • Automated taxonomy generation
  • Automated taxonomy induction
  • Automated taxonomy learning
  • Automatic outline building
  • Automatic outline construction
  • Automatic outline creation
  • Automatic outline extraction
  • Automatic outline generation
  • Automatic outline induction
  • Automatic outline learning
  • Automatic taxonomy building
  • Automatic taxonomy creation
  • Automatic taxonomy extraction
  • Automatic taxonomy generation
  • Automatic taxonomy induction
  • Automatic taxonomy learning
  • Outline automation
  • Outline building
  • Outline construction
  • Outline creation
  • Outline extraction
  • Outline generation
  • Outline induction
  • Outline learning
  • Semantic taxonomy building
  • Semantic taxonomy construction
  • Semantic taxonomy creation
  • Semantic taxonomy extraction
  • Semantic taxonomy generation
  • Semantic taxonomy induction
  • Semantic taxonomy learning
  • Taxonomy automation
  • Taxonomy building
  • Taxonomy construction
  • Taxonomy creation
  • Taxonomy extraction
  • Taxonomy generation
  • Taxonomy induction
  • Taxonomy learning

See also edit

References edit

  1. ^ "Taxonomy". 10 October 2021.
  2. ^ "Taxonomy Definition & Meaning". Dictionary.com. Retrieved 2022-05-13.
  3. ^ "What is Taxonomy?". 14 August 2017.
  4. ^ . Lexico.com. Archived from the original on March 2, 2021. Retrieved 2022-05-13.
  5. ^ "What is Taxonomy?". 20 August 2003.
  6. ^ "TAXONOMY (Noun) definition and synonyms | Macmillan Dictionary".
  7. ^ Neshati, Mahmood; Alijamaat, Ali; Abolhassani, Hassan; Rahimi, Afshin; Hoseini, Mehdi (2007). "Taxonomy Learning Using Compound Similarity Measure". IEEE/WIC/ACM International Conference on Web Intelligence (WI'07). pp. 487–490. doi:10.1109/WI.2007.135. ISBN 978-0-7695-3026-0. S2CID 14206314.
  8. ^ Brachman, Ronald (October 1983). "What IS-A is and isn't. An Analysis of Taxonomic Links in Semantic Networks". IEEE Computer. 16 (10): 30–36. doi:10.1109/MC.1983.1654194. OSTI 5363562. S2CID 16650410.
  9. ^ Velardi, Paola; Faralli, Stefano; Navigli, Roberto (10 October 2012). "OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction". Computational Linguistics. Association for Computational Linguistics. CiteSeerX 10.1.1.278.5674.
  10. ^ Liu, Xueqing; Song, Yangqiu; Liu, Shixia; Wang, Haixun (12–16 August 2012). "Automatic taxonomy construction from keywords". Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (PDF). ACM. p. 1433. doi:10.1145/2339530.2339754. ISBN 9781450314626. S2CID 9100603. Retrieved 7 March 2017.
  11. ^ Snow, Rion; Jurafsky, Daniel; Ng, Andrew. "Semantic Taxonomy Induction from Heterogenous Evidence" (PDF). Stanford University. Retrieved 8 March 2017. {{cite journal}}: Cite journal requires |journal= (help)

Further reading edit

  • Automatic Taxonomy Construction from Keywords (2012)
  • Domain taxonomy learning from text: The subsumption method versus hierarchical clustering from Data & Knowledge Engineering, Volume 83, January 2013, Pages 54–69
  • Learning taxonomic relations from a set of text documents
  • Learning Taxonomic Relations from Heterogeneous Sources of Evidence
  • A Metric-based Framework for Automatic Taxonomy Induction
  • A New Method for Evaluating Automatically Learned Terminological Taxonomies
  • Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
  • Structured Learning for Taxonomy Induction with Belief Propagation
  • Taxonomy Learning Using Word Sense Induction

External links edit

  • Taxonomy 101: The Basics and Getting Started with Taxonomies – shows where ATC fits in to the general activity of managing taxonomies for a business enterprise in need of knowledge management.

automatic, taxonomy, construction, software, programs, generate, taxonomical, classifications, from, body, texts, called, corpus, branch, natural, language, processing, which, turn, branch, artificial, intelligence, taxonomy, taxonomical, classification, schem. Automatic taxonomy construction ATC is the use of software programs to generate taxonomical classifications from a body of texts called a corpus ATC is a branch of natural language processing which in turn is a branch of artificial intelligence A taxonomy or taxonomical classification is a scheme of classification especially a hierarchical classification in which things are organized into groups or types 1 2 3 4 5 6 Among other things a taxonomy can be used to organize and index knowledge stored as documents articles videos etc such as in the form of a library classification system or a search engine taxonomy so that users can more easily find the information they are searching for Many taxonomies are hierarchies and thus have an intrinsic tree structure but not all are Manually developing and maintaining a taxonomy is a labor intensive task requiring significant time and resources including familiarity of or expertise in the taxonomy s domain scope subject or field which drives the costs and limits the scope of such projects Also domain modelers have their own points of view which inevitably even if unintentionally work their way into the taxonomy ATC uses artificial intelligence techniques to quickly automatically generate a taxonomy for a domain in order to avoid these problems and remove limitations Contents 1 Approaches 1 1 Keyword extraction 1 2 Hyponymy and is a relations 2 Applications 3 ATC software 4 Other names 5 See also 6 References 7 Further reading 8 External linksApproaches editThere are several approaches to ATC One approach is to use rules to detect patterns in the corpus and use those patterns to infer relations such as hyponymy Other approaches use machine learning techniques such as Bayesian inferencing and Artificial Neural Networks 7 Keyword extraction edit Main article Keyword extraction One approach to building a taxonomy is to automatically gather the keywords from a domain using keyword extraction then analyze the relationships between them see Hyponymy below and then arrange them as a taxonomy based on those relationships Hyponymy and is a relations edit Main articles Is a Hyponymy and Taxonomy general Is a and has a relationships and hyponymy In ATC programs one of the most important tasks is the discovery of hypernym and hyponym relations among words One way to do that from a body of text is to search for certain phrases like is a and such as In linguistics is a relations are called hyponymy Words that describe categories are called hypernyms and words that are examples of categories are hyponyms For example dog is a hypernym and Fido is one of its hyponyms A word can be both a hyponym and a hypernym So dog is a hyponym of mammal and also a hypernym of Fido Taxonomies are often represented as is a hierarchies where each level is more specific than in mathematical language a subset of the level above it For example a basic biology taxonomy would have concepts such as mammal which is a subset of animal and dogs and cats which are subsets of mammal This kind of taxonomy is called an is a model because the specific objects are considered instances of a concept For example Fido is a instance of the concept dog and Fluffy is a cat 8 Applications editATC can be used to build taxonomies for search engines to improve search results ATC systems are a key component of ontology learning also known as automatic ontology construction and have been used to automatically generate large ontologies for domains such as insurance and finance They have also been used to enhance existing large networks such as Wordnet to make them more complete and consistent 9 10 11 ATC software editThis section is empty You can help by adding to it August 2023 Other names editThis section does not cite any sources Please help improve this section by adding citations to reliable sources Unsourced material may be challenged and removed July 2019 Learn how and when to remove this template message Other names for automatic taxonomy construction include Automated outline building Automated outline construction Automated outline creation Automated outline extraction Automated outline generation Automated outline induction Automated outline learning Automated outlining Automated taxonomy building Automated taxonomy construction Automated taxonomy creation Automated taxonomy extraction Automated taxonomy generation Automated taxonomy induction Automated taxonomy learning Automatic outline building Automatic outline construction Automatic outline creation Automatic outline extraction Automatic outline generation Automatic outline induction Automatic outline learning Automatic taxonomy building Automatic taxonomy creation Automatic taxonomy extraction Automatic taxonomy generation Automatic taxonomy induction Automatic taxonomy learning Outline automation Outline building Outline construction Outline creation Outline extraction Outline generation Outline induction Outline learning Semantic taxonomy building Semantic taxonomy construction Semantic taxonomy creation Semantic taxonomy extraction Semantic taxonomy generation Semantic taxonomy induction Semantic taxonomy learning Taxonomy automation Taxonomy building Taxonomy construction Taxonomy creation Taxonomy extraction Taxonomy generation Taxonomy induction Taxonomy learningSee also editDocument classification Information extractionReferences edit Taxonomy 10 October 2021 Taxonomy Definition amp Meaning Dictionary com Retrieved 2022 05 13 What is Taxonomy 14 August 2017 TAXONOMY Meaning amp Definition for UK English Lexico com Archived from the original on March 2 2021 Retrieved 2022 05 13 What is Taxonomy 20 August 2003 TAXONOMY Noun definition and synonyms Macmillan Dictionary Neshati Mahmood Alijamaat Ali Abolhassani Hassan Rahimi Afshin Hoseini Mehdi 2007 Taxonomy Learning Using Compound Similarity Measure IEEE WIC ACM International Conference on Web Intelligence WI 07 pp 487 490 doi 10 1109 WI 2007 135 ISBN 978 0 7695 3026 0 S2CID 14206314 Brachman Ronald October 1983 What IS A is and isn t An Analysis of Taxonomic Links in Semantic Networks IEEE Computer 16 10 30 36 doi 10 1109 MC 1983 1654194 OSTI 5363562 S2CID 16650410 Velardi Paola Faralli Stefano Navigli Roberto 10 October 2012 OntoLearn Reloaded A Graph based Algorithm for Taxonomy Induction Computational Linguistics Association for Computational Linguistics CiteSeerX 10 1 1 278 5674 Liu Xueqing Song Yangqiu Liu Shixia Wang Haixun 12 16 August 2012 Automatic taxonomy construction from keywords Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining PDF ACM p 1433 doi 10 1145 2339530 2339754 ISBN 9781450314626 S2CID 9100603 Retrieved 7 March 2017 Snow Rion Jurafsky Daniel Ng Andrew Semantic Taxonomy Induction from Heterogenous Evidence PDF Stanford University Retrieved 8 March 2017 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Further reading editAutomatic Taxonomy Construction from Keywords 2012 Domain taxonomy learning from text The subsumption method versus hierarchical clustering from Data amp Knowledge Engineering Volume 83 January 2013 Pages 54 69 Learning taxonomic relations from a set of text documents Learning Taxonomic Relations from Heterogeneous Sources of Evidence A Metric based Framework for Automatic Taxonomy Induction A New Method for Evaluating Automatically Learned Terminological Taxonomies Problematizing and Addressing the Article as Concept Assumption in Wikipedia Structured Learning for Taxonomy Induction with Belief Propagation Taxonomy Learning Using Word Sense InductionExternal links editTaxonomy 101 The Basics and Getting Started with Taxonomies shows where ATC fits in to the general activity of managing taxonomies for a business enterprise in need of knowledge management Retrieved from https en wikipedia org w index php title Automatic taxonomy construction amp oldid 1188516198, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.