fbpx
Wikipedia

Semantic compression

In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document (or a set of documents) by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words.

In most applications, semantic compression is a lossy compression, that is, increased prolixity does not compensate for the lexical compression, and an original document cannot be reconstructed in a reverse process.

By generalization edit

Semantic compression is basically achieved in two steps, using frequency dictionaries and semantic network:

  1. determining cumulated term frequencies to identify target lexicon,
  2. replacing less frequent terms with their hypernyms (generalization) from target lexicon.[1]

Step 1 requires assembling word frequencies and information on semantic relationships, specifically hyponymy. Moving upwards in word hierarchy, a cumulative concept frequency is calculating by adding a sum of hyponyms' frequencies to frequency of their hypernym:   where   is a hypernym of  . Then, a desired number of words with top cumulated frequencies are chosen to build a targed lexicon.

In the second step, compression mapping rules are defined for the remaining words, in order to handle every occurrence of a less frequent hyponym as its hypernym in output text.

Example

The below fragment of text has been processed by the semantic compression. Words in bold have been replaced by their hypernyms.

They are both nest building social insects, but paper wasps and honey bees organize their colonies

in very different ways. In a new study, researchers report that despite their differences, these insects rely on the same network of genes to guide their social behavior.The study appears in the Proceedings of the Royal Society B: Biological Sciences. Honey bees and paper wasps are separated by more than 100 million years of

evolution, and there are striking differences in how they divvy up the work of maintaining a colony.

The procedure outputs the following text:

They are both facility building insect, but insects and honey insects arrange their biological groups

in very different structure. In a new study, researchers report that despite their difference of opinions, these insects act the same network of genes to steer their party demeanor. The study appears in the proceeding of the institution bacteria Biological Sciences. Honey insects and insect are separated by more than hundred million years of

organic processes, and there are impinging differences of opinions in how they divvy up the work of affirming a biological group.

Implicit semantic compression edit

A natural tendency to keep natural language expressions concise can be perceived as a form of implicit semantic compression, by omitting unmeaningful words or redundant meaningful words (especially to avoid pleonasms).[2]

Applications and advantages edit

In the vector space model, compacting a lexicon leads to a reduction of dimensionality, which results in less computational complexity and a positive influence on efficiency.

Semantic compression is advantageous in information retrieval tasks, improving their effectiveness (in terms of both precision and recall).[3] This is due to more precise descriptors (reduced effect of language diversity – limited language redundancy, a step towards a controlled dictionary).

As in the example above, it is possible to display the output as natural text (re-applying inflexion, adding stop words).

See also edit

References edit

  1. ^ Ceglarek, D.; Haniewicz, K.; Rutkowski, W. (2010). "Semantic Compression for Specialised Information Retrieval Systems". Advances in Intelligent Information and Database Systems. Studies in Computational Intelligence. Vol. 283. pp. 111–121. doi:10.1007/978-3-642-12090-9_10. ISBN 978-3-642-12089-3.
  2. ^ Percova, N.N. (1982). "On the types of semantic compression of text". COLING '82 Proceedings of the 9th Conference on Computational Linguistics. Vol. 2. pp. 229–231. doi:10.3115/990100.990155. ISBN 0-444-86393-1. S2CID 33742593.
  3. ^ Ceglarek, D.; Haniewicz, K.; Rutkowski, W. (2010). "Quality of semantic compression in classification". Proceedings of the 2nd International Conference on Computational Collective Intelligence: Technologies and Applications. Vol. 1. Springer. pp. 162–171. ISBN 978-3-642-16692-1.

External links edit

  • Semantic compression on Project SENECA (Semantic Networks and Categorization) website

semantic, compression, natural, language, processing, semantic, compression, process, compacting, lexicon, used, build, textual, document, documents, reducing, language, heterogeneity, while, maintaining, text, semantics, result, same, ideas, represented, usin. In natural language processing semantic compression is a process of compacting a lexicon used to build a textual document or a set of documents by reducing language heterogeneity while maintaining text semantics As a result the same ideas can be represented using a smaller set of words In most applications semantic compression is a lossy compression that is increased prolixity does not compensate for the lexical compression and an original document cannot be reconstructed in a reverse process Contents 1 By generalization 2 Implicit semantic compression 3 Applications and advantages 4 See also 5 References 6 External linksBy generalization editSemantic compression is basically achieved in two steps using frequency dictionaries and semantic network determining cumulated term frequencies to identify target lexicon replacing less frequent terms with their hypernyms generalization from target lexicon 1 Step 1 requires assembling word frequencies and information on semantic relationships specifically hyponymy Moving upwards in word hierarchy a cumulative concept frequency is calculating by adding a sum of hyponyms frequencies to frequency of their hypernym c u m f k i f k i j c u m f k j displaystyle cumf k i f k i sum j cumf k j nbsp where k i displaystyle k i nbsp is a hypernym of k j displaystyle k j nbsp Then a desired number of words with top cumulated frequencies are chosen to build a targed lexicon In the second step compression mapping rules are defined for the remaining words in order to handle every occurrence of a less frequent hyponym as its hypernym in output text ExampleThe below fragment of text has been processed by the semantic compression Words in bold have been replaced by their hypernyms They are both nest building social insects but paper wasps and honey bees organize their coloniesin very different ways In a new study researchers report that despite their differences these insects rely on the same network of genes to guide their social behavior The study appears in the Proceedings of the Royal Society B Biological Sciences Honey bees and paper wasps are separated by more than 100 million years of evolution and there are striking differences in how they divvy up the work of maintaining a colony The procedure outputs the following text They are both facility building insect but insects and honey insects arrange their biological groupsin very different structure In a new study researchers report that despite their difference of opinions these insects act the same network of genes to steer their party demeanor The study appears in the proceeding of the institution bacteria Biological Sciences Honey insects and insect are separated by more than hundred million years of organic processes and there are impinging differences of opinions in how they divvy up the work of affirming a biological group Implicit semantic compression editA natural tendency to keep natural language expressions concise can be perceived as a form of implicit semantic compression by omitting unmeaningful words or redundant meaningful words especially to avoid pleonasms 2 Applications and advantages editIn the vector space model compacting a lexicon leads to a reduction of dimensionality which results in less computational complexity and a positive influence on efficiency Semantic compression is advantageous in information retrieval tasks improving their effectiveness in terms of both precision and recall 3 This is due to more precise descriptors reduced effect of language diversity limited language redundancy a step towards a controlled dictionary As in the example above it is possible to display the output as natural text re applying inflexion adding stop words See also editControlled natural language Information theory Lexical substitution Quantities of information Text simplificationReferences edit Ceglarek D Haniewicz K Rutkowski W 2010 Semantic Compression for Specialised Information Retrieval Systems Advances in Intelligent Information and Database Systems Studies in Computational Intelligence Vol 283 pp 111 121 doi 10 1007 978 3 642 12090 9 10 ISBN 978 3 642 12089 3 Percova N N 1982 On the types of semantic compression of text COLING 82 Proceedings of the 9th Conference on Computational Linguistics Vol 2 pp 229 231 doi 10 3115 990100 990155 ISBN 0 444 86393 1 S2CID 33742593 Ceglarek D Haniewicz K Rutkowski W 2010 Quality of semantic compression in classification Proceedings of the 2nd International Conference on Computational Collective Intelligence Technologies and Applications Vol 1 Springer pp 162 171 ISBN 978 3 642 16692 1 External links editSemantic compression on Project SENECA Semantic Networks and Categorization website Retrieved from https en wikipedia org w index php title Semantic compression amp oldid 1172268540, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.