fbpx
Wikipedia

Scott's Pi

Scott's pi (named after William A Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and the goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance.

Introduction edit

Scott's pi is similar to Cohen's kappa in that they improve on simple observed agreement by factoring in the extent of agreement that might be expected by chance. However, in each statistic, the expected agreement is calculated slightly differently. Scott's pi makes the assumption that annotators have the same distribution of responses, which makes Cohen's kappa slightly more informative. Scott's pi is extended to more than two annotators by Fleiss' kappa.

The equation for Scott's pi, as in Cohen's kappa, is:

 

However, Pr(e) is calculated using squared "joint proportions" which are squared arithmetic means of the marginal proportions (whereas Cohen's uses squared geometric means of them).

Worked example edit

Confusion matrix for two annotators, three categories {Yes, No, Maybe} and 45 items rated (90 ratings for 2 annotators):

Yes No Maybe Marginal Sum
Yes 1 2 3 6
No 4 5 6 15
Maybe 7 8 9 24
Marginal Sum 12 15 18 45

To calculate the expected agreement, sum marginals across annotators and divide by the total number of ratings to obtain joint proportions. Square and total these:

Ann1 Ann2 Joint Proportion JP Squared
Yes 12 6 (12 + 6)/90 = 0.2 0.04
No 15 15 (15 + 15)/90 = 0.333 0.111
Maybe 18 24 (18 + 24)/90 = 0.467 0.218
Total 0.369

To calculate observed agreement, divide the number of items on which annotators agreed by the total number of items. In this case,

 

Given that Pr(e) = 0.369, Scott's pi is then

 

See also edit

References edit

  • Scott, W. (1955). "Reliability of content analysis: The case of nominal scale coding." Public Opinion Quarterly, 19(3), 321–325.
  • Krippendorff, K. (2004b) “Reliability in content analysis: Some common misconceptions and recommendations.” in Human Communication Research. Vol. 30, pp. 411–433.

scott, scott, named, after, william, scott, statistic, measuring, inter, rater, reliability, nominal, data, communication, studies, textual, entities, annotated, with, categories, different, annotators, various, measures, used, assess, extent, agreement, betwe. Scott s pi named after William A Scott is a statistic for measuring inter rater reliability for nominal data in communication studies Textual entities are annotated with categories by different annotators and various measures are used to assess the extent of agreement between the annotators one of which is Scott s pi Since automatically annotating text is a popular problem in natural language processing and the goal is to get the computer program that is being developed to agree with the humans in the annotations it creates assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance Contents 1 Introduction 2 Worked example 3 See also 4 ReferencesIntroduction editScott s pi is similar to Cohen s kappa in that they improve on simple observed agreement by factoring in the extent of agreement that might be expected by chance However in each statistic the expected agreement is calculated slightly differently Scott s pi makes the assumption that annotators have the same distribution of responses which makes Cohen s kappa slightly more informative Scott s pi is extended to more than two annotators by Fleiss kappa The equation for Scott s pi as in Cohen s kappa is p Pr a Pr e 1 Pr e displaystyle pi frac Pr a Pr e 1 Pr e nbsp However Pr e is calculated using squared joint proportions which are squared arithmetic means of the marginal proportions whereas Cohen s uses squared geometric means of them Worked example editConfusion matrix for two annotators three categories Yes No Maybe and 45 items rated 90 ratings for 2 annotators Yes No Maybe Marginal Sum Yes 1 2 3 6 No 4 5 6 15 Maybe 7 8 9 24 Marginal Sum 12 15 18 45 To calculate the expected agreement sum marginals across annotators and divide by the total number of ratings to obtain joint proportions Square and total these Ann1 Ann2 Joint Proportion JP Squared Yes 12 6 12 6 90 0 2 0 04 No 15 15 15 15 90 0 333 0 111 Maybe 18 24 18 24 90 0 467 0 218 Total 0 369 To calculate observed agreement divide the number of items on which annotators agreed by the total number of items In this case Pr a 1 5 9 45 0 333 displaystyle Pr a frac 1 5 9 45 0 333 nbsp Given that Pr e 0 369 Scott s pi is then p 0 333 0 369 1 0 369 0 057 displaystyle pi frac 0 333 0 369 1 0 369 0 057 nbsp See also editKrippendorff s alphaReferences editScott W 1955 Reliability of content analysis The case of nominal scale coding Public Opinion Quarterly 19 3 321 325 Krippendorff K 2004b Reliability in content analysis Some common misconceptions and recommendations in Human Communication Research Vol 30 pp 411 433 Retrieved from https en wikipedia org w index php title Scott 27s Pi amp oldid 1151174394, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.