fbpx
Wikipedia

Christopher D. Paice

Christopher D Paice was one of the pioneers of research into stemming. The Paice-Husk stemmer was published in 1990 and his method of evaluation of stemmer performance by means of Error Rate with Respect to Truncation (ERRT) was the first direct method of comparing under-stemming and over-stemming errors. Apart from his pioneering work on stemming algorithms and evaluation methods he made other research contributions in the area of Information Retrieval, anaphora resolution and automatic abstracting.[1][2]

Teaching career edit

Christopher D Paice was a member of the School of Computing and Communications (SCC) at Lancaster University, United Kingdom for around forty years, initially joining the then Department of Computer Studies as a Research Associate in 1969-70; then moving on to a Lectureship. He was acting Head of Department in 1977-78, Head of Department 1979-82 and retired in 2009.[3]

The Paice-Husk Stemming Algorithm edit

The Paice-Husk Stemmer was developed by Chris D Paice with the assistance of Gareth Husk in the Computing Department at Lancaster University, in the late 1980s, it features an externally stored set of stemming rules, and this flexibility over the Porter stemmer made it of interest to several researchers.[4]

Originally implemented in Pascal programming language, further implementations have been made using ANSI C and Java. A Perl version was implemented by Mary Taffet at the Center for Natural Language Processing at Syracuse University, USA.[5]

The stemmer consists of a stemming algorithm and a separate set of stemming rules. The standard set of rules provides a 'strong' stemmer. Stemmer strength is a quality that is advantageous for index compression, however, it produce a larger number of Overstemming errors relative to the number of Understemming errors; users who need a lighter stemmer can easily develop their own set of rules.

The Stemmer is iterative (i.e. endings are removed piecemeal in an indefinite number of stages) and the rules may specify the removal or replacement of an ending. The replacement technique avoids the need for a separate stage in the process to recode or provide partial matching; this helps maintain the efficiency of the algorithm. The rules are indexed by the last letter of the ending to allow efficient searching.[6]

Stemmer Evaluation edit

Apart from the Stemmer itself, Chris Paice developed a method for directly measuring the performance of stemmers using grouped lists of words applied to the stemmer, counting the number of overstemming and understemming errors, then comparing the results with what would have been obtained by using a set of truncation stemmers. The final measure being the Error Rate Relative to Truncation (ERRT).[7][8]

Personal life edit

Christopher D Paice was born in 1941, he married Kathleen F Moss in 1965 in the Manchester Registration district. In 2015 he was diagnosed with an aggressive brain tumour, shortly after he and his wife moved away from Cumbria to Stratford, he passed away 21 April 2016.

Publications edit

  • C D Paice (1977). Information Retrieval and the Computer. Macdonald and Jane's, London.
  • C D Paice (1980). Proceedings SIGIR '80 The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases. Butterworth. ISBN 0-408-10775-8.
  • C D Paice (1984). Information Technology Research Development Applications: Volume 3 Issue 1, Soft evaluation of Boolean search queries in information retrieval systems. Butterworth.
  • C D Paice; V. Aragón-Ramírez (1985). RIAO '85: Recherche d'Informations Assistée par Ordinateur, The calculation of similarities between multi-word strings using a thesaurus. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE.
  • C D Paice (1986). ASLIB Proceedings: Volume 38 Issue 10, Expert systems for information retrieval?. Aslib, The Association for Information Management.
  • C D Paice (1990). Information Processing and Management: an International Journal, Volume 26 Issue 1 Constructing literature abstracts by computer: techniques and prospects. Pergamon Press, Inc.
  • C D Paice (1990). Information Processing and Management: an International Journal, Volume 27 Issue 5 A thesaural model of information retrieval. Pergamon Press, Inc.
  • C D Paice (1991). ACM SIGIR Forum: Volume 24 Issue 3 Another stemmer. ACM.
  • F. C. Johnson; C. D. Paice; W. J. Black; A. P. Neal (1997). Readings in information retrieval: The application of linguistic processing to automatic abstract generation. Morgan Kaufmann Publishers Inc.
  • Michael B. Twidale; David M. Nichols; Chris D. Paice (1997). Information Processing and Management: an International Journal: Volume 33 Issue 6, Browsing is a collaborative process. Pergamon Press, Inc.
  • Michael P. Oakes; C. D. Paice (1999). IRSG'99: Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research The automatic generation of templates for automatic abstracting. BCS.
  • C. D. Paice (2009). Lexical Analysis of Textual Data. Encyclopedia of Database Systems. Springer, US. pp. 1606–1610. ISBN 978-0-387-35544-3.
  • C. D. Paice (2009). Stemming. Encyclopedia of Database Systems. Springer, US. pp. 2790–2793. ISBN 978-0-387-35544-3.

References edit

  1. ^ [1], University Trier, DBLP Computer Science Bibliography
  2. ^ [2], ACM Author page, C D Paice
  3. ^ [3], Lancaster University, In Memory of Chris Paice
  4. ^ [4], Improvements to the Lancaster Stemming Algorithm (Paice-Husk Stemmer), Antonio Zamora
  5. ^ [5], GitHub, Paice-Husk Stemmer in several languages
  6. ^ "Paice/Husk Stemmer". from the original on 2006-08-22. Retrieved 2006-08-22.
  7. ^ Paice, C.D., (1994) An evaluation method for stemming algorithms, in Croft, W.B. & van Rijsbergen, C.J. (eds.), Proceedings of the 17th ACM SIGIR conference held at Dublin, July 3–6, 1994; pp. 42-50.
  8. ^ Paice, C.D. (1996) Method for Evaluation of Stemming Algorithms based on Error Counting, JASIS, 47(8): 632-649

christopher, paice, christopher, paice, pioneers, research, into, stemming, paice, husk, stemmer, published, 1990, method, evaluation, stemmer, performance, means, error, rate, with, respect, truncation, errt, first, direct, method, comparing, under, stemming,. Christopher D Paice was one of the pioneers of research into stemming The Paice Husk stemmer was published in 1990 and his method of evaluation of stemmer performance by means of Error Rate with Respect to Truncation ERRT was the first direct method of comparing under stemming and over stemming errors Apart from his pioneering work on stemming algorithms and evaluation methods he made other research contributions in the area of Information Retrieval anaphora resolution and automatic abstracting 1 2 Contents 1 Teaching career 2 The Paice Husk Stemming Algorithm 3 Stemmer Evaluation 4 Personal life 5 Publications 6 ReferencesTeaching career editChristopher D Paice was a member of the School of Computing and Communications SCC at Lancaster University United Kingdom for around forty years initially joining the then Department of Computer Studies as a Research Associate in 1969 70 then moving on to a Lectureship He was acting Head of Department in 1977 78 Head of Department 1979 82 and retired in 2009 3 The Paice Husk Stemming Algorithm editThe Paice Husk Stemmer was developed by Chris D Paice with the assistance of Gareth Husk in the Computing Department at Lancaster University in the late 1980s it features an externally stored set of stemming rules and this flexibility over the Porter stemmer made it of interest to several researchers 4 Originally implemented in Pascal programming language further implementations have been made using ANSI C and Java A Perl version was implemented by Mary Taffet at the Center for Natural Language Processing at Syracuse University USA 5 The stemmer consists of a stemming algorithm and a separate set of stemming rules The standard set of rules provides a strong stemmer Stemmer strength is a quality that is advantageous for index compression however it produce a larger number of Overstemming errors relative to the number of Understemming errors users who need a lighter stemmer can easily develop their own set of rules The Stemmer is iterative i e endings are removed piecemeal in an indefinite number of stages and the rules may specify the removal or replacement of an ending The replacement technique avoids the need for a separate stage in the process to recode or provide partial matching this helps maintain the efficiency of the algorithm The rules are indexed by the last letter of the ending to allow efficient searching 6 Stemmer Evaluation editApart from the Stemmer itself Chris Paice developed a method for directly measuring the performance of stemmers using grouped lists of words applied to the stemmer counting the number of overstemming and understemming errors then comparing the results with what would have been obtained by using a set of truncation stemmers The final measure being the Error Rate Relative to Truncation ERRT 7 8 Personal life editChristopher D Paice was born in 1941 he married Kathleen F Moss in 1965 in the Manchester Registration district In 2015 he was diagnosed with an aggressive brain tumour shortly after he and his wife moved away from Cumbria to Stratford he passed away 21 April 2016 Publications editC D Paice 1977 Information Retrieval and the Computer Macdonald and Jane s London C D Paice 1980 Proceedings SIGIR 80 The automatic generation of literature abstracts an approach based on the identification of self indicating phrases Butterworth ISBN 0 408 10775 8 C D Paice 1984 Information Technology Research Development Applications Volume 3 Issue 1 Soft evaluation of Boolean search queries in information retrieval systems Butterworth C D Paice V Aragon Ramirez 1985 RIAO 85 Recherche d Informations Assistee par Ordinateur The calculation of similarities between multi word strings using a thesaurus LE CENTRE DE HAUTES ETUDES INTERNATIONALES D INFORMATIQUE DOCUMENTAIRE C D Paice 1986 ASLIB Proceedings Volume 38 Issue 10 Expert systems for information retrieval Aslib The Association for Information Management C D Paice 1990 Information Processing and Management an International Journal Volume 26 Issue 1 Constructing literature abstracts by computer techniques and prospects Pergamon Press Inc C D Paice 1990 Information Processing and Management an International Journal Volume 27 Issue 5 A thesaural model of information retrieval Pergamon Press Inc C D Paice 1991 ACM SIGIR Forum Volume 24 Issue 3 Another stemmer ACM F C Johnson C D Paice W J Black A P Neal 1997 Readings in information retrieval The application of linguistic processing to automatic abstract generation Morgan Kaufmann Publishers Inc Michael B Twidale David M Nichols Chris D Paice 1997 Information Processing and Management an International Journal Volume 33 Issue 6 Browsing is a collaborative process Pergamon Press Inc Michael P Oakes C D Paice 1999 IRSG 99 Proceedings of the 21st Annual BCS IRSG conference on Information Retrieval Research The automatic generation of templates for automatic abstracting BCS C D Paice 2009 Lexical Analysis of Textual Data Encyclopedia of Database Systems Springer US pp 1606 1610 ISBN 978 0 387 35544 3 C D Paice 2009 Stemming Encyclopedia of Database Systems Springer US pp 2790 2793 ISBN 978 0 387 35544 3 References edit 1 University Trier DBLP Computer Science Bibliography 2 ACM Author page C D Paice 3 Lancaster University In Memory of Chris Paice 4 Improvements to the Lancaster Stemming Algorithm Paice Husk Stemmer Antonio Zamora 5 GitHub Paice Husk Stemmer in several languages Paice Husk Stemmer Archived from the original on 2006 08 22 Retrieved 2006 08 22 Paice C D 1994 An evaluation method for stemming algorithms in Croft W B amp van Rijsbergen C J eds Proceedings of the 17th ACM SIGIR conference held at Dublin July 3 6 1994 pp 42 50 Paice C D 1996 Method for Evaluation of Stemming Algorithms based on Error Counting JASIS 47 8 632 649 Retrieved from https en wikipedia org w index php title Christopher D Paice amp oldid 1194056330, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.