fbpx
Wikipedia

Sayre's paradox

Sayre's paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized.[1][2] The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named.[3]

Nature of the problem edit

It is relatively easy to design automated systems capable of recognizing words inscribed in a printed format. Such words are segmented into letters by the very act of writing them on the page. Given templates matching typical letter shapes in a given language, individual letters can be identified with a high degree of probability. In cases of ambiguity, probable letter sequences can be compared with a selection of properly spelled words in that language (called a lexicon).[4] If necessary, syntactic features of the language can be applied to render a generally accurate identification of the words in question.[5] Printed-character recognition systems of this sort are commonly used in processing standardized government forms, in sorting mail by zip code, and so forth.

In cursive writing, however, letters comprising a given word typically flow sequentially without gaps between them. Unlike a sequence of printed letters, cursively connected letters are not segmented in advance. Here is where Sayre's Paradox comes into play. Unless the word is already segmented into letters, template-matching techniques like those described above cannot be applied. That is, segmentation is a prerequisite for word recognition. But there are no reliable techniques for segmenting a word into letters unless the word itself has been identified. Word recognition requires letter segmentation, and letter segmentation requires word recognition. There is no way a cursive writing recognition system employing standard template-matching techniques can do both simultaneously.

Advantages to be gained by use of automated cursive writing recognition systems include routing mail with handwritten addresses, reading handwritten bank checks, and automated digitalization of hand-written documents.[1] These are practical incentives for finding ways of circumventing Sayre's Paradox.

Avoiding the paradox edit

One way of ameliorating the adverse effects of the paradox is to normalize the word inscriptions to be recognized. Normalization amounts to eliminating idiosyncrasies in the penmanship of the writer, such as unusual slope of the letters and unusual slant of the cursive line.[4] This procedure can increase the probability of a correct match with a letter template, resulting in an incremental improvement in the success rate of the system. Since improvement of this sort still depends on accurate segmentation, however, it remains subject to the limitations of Sayre's Paradox. Researchers have come to realize that the only way to circumvent the paradox is by use of procedures that do not rely on accurate segmentation.[1]

Directions of current research edit

Segmentation is accurate to the extent that it matches distinctions among letters in the actual inscriptions presented to the system for recognition (the input data). This is sometimes referred to as “explicit segmentation”.[4] “Implicit segmentation,” by contrast, is division of the cursive line into more parts than the number of actual letters in the cursive line itself. Processing these “implicit parts” to achieve eventual word identification requires specific statistical procedures involving Hidden Markov Models (HMM).

A Markov model is a statistical representation of a random process, which is to say a process in which future states are independent of states occurring before the present. In such a process, a given state is dependent only on the conditional probability of its following the state immediately before it. An example is a series of outcomes from successive casts of a die. An HMM is a Markov model, individual states of which are not fully known. Conditional probabilities between states are still determinate, but the identities of individual states are not fully disclosed.

Recognition proceeds by matching HMMs of words to be recognized with previously prepared HMMs of words in the lexicon. The best match in a given case is taken to indicate the identity of the handwritten word in question. As with systems based on explicit segmentation, automated recognition systems based on implicit segmentation are judged more or less successful according to the percentage of correct identifications they accomplish.

Instead of explicit segmentation techniques, most automated handwriting recognition systems today employ implicit segmentation in conjunction with HMM-based matching procedures.[1] The constraints epitomized by Sayre's Paradox are largely responsible for this shift in approach.

References edit

  1. ^ a b c d Vinciarelli, Alessandro (April 2003). Offline Cursive Handwriting: From Word To Text Recognition (PhD). IDIAP.
  2. ^ Fischer, Andreas; Frinken, Volkmar; Bunke, Horst (2013). "Chapter 17 - Hidden Markov Models for Off-Line Cursive Handwriting Recognition". In Rao, C.R.; Govindaraju, Venu (eds.). Handbook of Statistics. Elsevier. pp. 421–442. doi:10.1016/B978-0-444-53859-8.00017-5. ISBN 9780444538598. ISSN 0169-7161.
  3. ^ Sayre, Kenneth M. (1973). "Machine recognition of handwritten words: A project report". Pattern Recognition. Pergamon Press. 5 (3): 213–228. Bibcode:1973PatRe...5..213S. doi:10.1016/0031-3203(73)90044-7. ISSN 0031-3203.
  4. ^ a b c Vinciarelli, Alessandro (July 2002). "A survey on off-line Cursive Word Recognition". Pattern Recognition. 35 (7): 1433–1446. Bibcode:2002PatRe..35.1433V. doi:10.1016/S0031-3203(01)00129-7. ISSN 0031-3203.
  5. ^ Maroneze, André O.; Coüasnon, Bertrand; Lemaitre, Aurélie (2011-01-24). Agam, Gady; Viard-Gaudin, Christian (eds.). Introduction of statistical information in a syntactic analyzer for document image recognition. Document Recognition and Retrieval XVIII. Vol. 7874. SPIE. pp. 28–38. doi:10.1117/12.873393.

External links edit

  • Kenneth M. Sayre and the Philosophic Institute.

sayre, paradox, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, february, 2. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Sayre s paradox news newspapers books scholar JSTOR February 2020 template removal help Sayre s paradox is a dilemma encountered in the design of automated handwriting recognition systems A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized 1 2 The paradox was first articulated in a 1973 publication by Kenneth M Sayre after whom it was named 3 Contents 1 Nature of the problem 2 Avoiding the paradox 3 Directions of current research 4 References 5 External linksNature of the problem editIt is relatively easy to design automated systems capable of recognizing words inscribed in a printed format Such words are segmented into letters by the very act of writing them on the page Given templates matching typical letter shapes in a given language individual letters can be identified with a high degree of probability In cases of ambiguity probable letter sequences can be compared with a selection of properly spelled words in that language called a lexicon 4 If necessary syntactic features of the language can be applied to render a generally accurate identification of the words in question 5 Printed character recognition systems of this sort are commonly used in processing standardized government forms in sorting mail by zip code and so forth In cursive writing however letters comprising a given word typically flow sequentially without gaps between them Unlike a sequence of printed letters cursively connected letters are not segmented in advance Here is where Sayre s Paradox comes into play Unless the word is already segmented into letters template matching techniques like those described above cannot be applied That is segmentation is a prerequisite for word recognition But there are no reliable techniques for segmenting a word into letters unless the word itself has been identified Word recognition requires letter segmentation and letter segmentation requires word recognition There is no way a cursive writing recognition system employing standard template matching techniques can do both simultaneously Advantages to be gained by use of automated cursive writing recognition systems include routing mail with handwritten addresses reading handwritten bank checks and automated digitalization of hand written documents 1 These are practical incentives for finding ways of circumventing Sayre s Paradox Avoiding the paradox editOne way of ameliorating the adverse effects of the paradox is to normalize the word inscriptions to be recognized Normalization amounts to eliminating idiosyncrasies in the penmanship of the writer such as unusual slope of the letters and unusual slant of the cursive line 4 This procedure can increase the probability of a correct match with a letter template resulting in an incremental improvement in the success rate of the system Since improvement of this sort still depends on accurate segmentation however it remains subject to the limitations of Sayre s Paradox Researchers have come to realize that the only way to circumvent the paradox is by use of procedures that do not rely on accurate segmentation 1 Directions of current research editSegmentation is accurate to the extent that it matches distinctions among letters in the actual inscriptions presented to the system for recognition the input data This is sometimes referred to as explicit segmentation 4 Implicit segmentation by contrast is division of the cursive line into more parts than the number of actual letters in the cursive line itself Processing these implicit parts to achieve eventual word identification requires specific statistical procedures involving Hidden Markov Models HMM A Markov model is a statistical representation of a random process which is to say a process in which future states are independent of states occurring before the present In such a process a given state is dependent only on the conditional probability of its following the state immediately before it An example is a series of outcomes from successive casts of a die An HMM is a Markov model individual states of which are not fully known Conditional probabilities between states are still determinate but the identities of individual states are not fully disclosed Recognition proceeds by matching HMMs of words to be recognized with previously prepared HMMs of words in the lexicon The best match in a given case is taken to indicate the identity of the handwritten word in question As with systems based on explicit segmentation automated recognition systems based on implicit segmentation are judged more or less successful according to the percentage of correct identifications they accomplish Instead of explicit segmentation techniques most automated handwriting recognition systems today employ implicit segmentation in conjunction with HMM based matching procedures 1 The constraints epitomized by Sayre s Paradox are largely responsible for this shift in approach References edit a b c d Vinciarelli Alessandro April 2003 Offline Cursive Handwriting From Word To Text Recognition PhD IDIAP Fischer Andreas Frinken Volkmar Bunke Horst 2013 Chapter 17 Hidden Markov Models for Off Line Cursive Handwriting Recognition In Rao C R Govindaraju Venu eds Handbook of Statistics Elsevier pp 421 442 doi 10 1016 B978 0 444 53859 8 00017 5 ISBN 9780444538598 ISSN 0169 7161 Sayre Kenneth M 1973 Machine recognition of handwritten words A project report Pattern Recognition Pergamon Press 5 3 213 228 Bibcode 1973PatRe 5 213S doi 10 1016 0031 3203 73 90044 7 ISSN 0031 3203 a b c Vinciarelli Alessandro July 2002 A survey on off line Cursive Word Recognition Pattern Recognition 35 7 1433 1446 Bibcode 2002PatRe 35 1433V doi 10 1016 S0031 3203 01 00129 7 ISSN 0031 3203 Maroneze Andre O Couasnon Bertrand Lemaitre Aurelie 2011 01 24 Agam Gady Viard Gaudin Christian eds Introduction of statistical information in a syntactic analyzer for document image recognition Document Recognition and Retrieval XVIII Vol 7874 SPIE pp 28 38 doi 10 1117 12 873393 External links editKenneth M Sayre and the Philosophic Institute Retrieved from https en wikipedia org w index php title Sayre 27s paradox amp oldid 1142295954, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.