fbpx
Wikipedia

Sequential probability ratio test

The sequential probability ratio test (SPRT) is a specific sequential hypothesis test, developed by Abraham Wald[1] and later proven to be optimal by Wald and Jacob Wolfowitz.[2] Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem. The Neyman-Pearson lemma, by contrast, offers a rule of thumb for when all the data is collected (and its likelihood ratio known).

While originally developed for use in quality control studies in the realm of manufacturing, SPRT has been formulated for use in the computerized testing of human examinees as a termination criterion.[3][4][5]

Theory edit

As in classical hypothesis testing, SPRT starts with a pair of hypotheses, say   and   for the null hypothesis and alternative hypothesis respectively. They must be specified as follows:

 
 

The next step is to calculate the cumulative sum of the log-likelihood ratio,  , as new data arrive: with  , then, for  =1,2,...,

 

The stopping rule is a simple thresholding scheme:

  •  : continue monitoring (critical inequality)
  •  : Accept  
  •  : Accept  

where   and   ( ) depend on the desired type I and type II errors,   and  . They may be chosen as follows:

  and  

In other words,   and   must be decided beforehand in order to set the thresholds appropriately. The numerical value will depend on the application. The reason for being only an approximation is that, in the discrete case, the signal may cross the threshold between samples. Thus, depending on the penalty of making an error and the sampling frequency, one might set the thresholds more aggressively. The exact bounds are correct in the continuous case.

Example edit

A textbook example is parameter estimation of a probability distribution function. Consider the exponential distribution:

 

The hypotheses are

 

Then the log-likelihood function (LLF) for one sample is

 

The cumulative sum of the LLFs for all x is

 

Accordingly, the stopping rule is:

 

After re-arranging we finally find

 

The thresholds are simply two parallel lines with slope  . Sampling should stop when the sum of the samples makes an excursion outside the continue-sampling region.

Applications edit

Manufacturing edit

The test is done on the proportion metric, and tests that a variable p is equal to one of two desired points, p1 or p2. The region between these two points is known as the indifference region (IR). For example, suppose you are performing a quality control study on a factory lot of widgets. Management would like the lot to have 3% or less defective widgets, but 1% or less is the ideal lot that would pass with flying colors. In this example, p1 = 0.01 and p2 = 0.03 and the region between them is the IR because management considers these lots to be marginal and is OK with them being classified either way. Widgets would be sampled one at a time from the lot (sequential analysis) until the test determines, within an acceptable error level, that the lot is ideal or should be rejected.

Testing of human examinees edit

The SPRT is currently the predominant method of classifying examinees in a variable-length computerized classification test (CCT)[citation needed]. The two parameters are p1 and p2 are specified by determining a cutscore (threshold) for examinees on the proportion correct metric, and selecting a point above and below that cutscore. For instance, suppose the cutscore is set at 70% for a test. We could select p1 = 0.65 and p2 = 0.75 . The test then evaluates the likelihood that an examinee's true score on that metric is equal to one of those two points. If the examinee is determined to be at 75%, they pass, and they fail if they are determined to be at 65%.

These points are not specified completely arbitrarily. A cutscore should always be set with a legally defensible method, such as a modified Angoff procedure. Again, the indifference region represents the region of scores that the test designer is OK with going either way (pass or fail). The upper parameter p2 is conceptually the highest level that the test designer is willing to accept for a Fail (because everyone below it has a good chance of failing), and the lower parameter p1 is the lowest level that the test designer is willing to accept for a pass (because everyone above it has a decent chance of passing). While this definition may seem to be a relatively small burden, consider the high-stakes case of a licensing test for medical doctors: at just what point should we consider somebody to be at one of these two levels?

While the SPRT was first applied to testing in the days of classical test theory, as is applied in the previous paragraph, Reckase (1983) suggested that item response theory be used to determine the p1 and p2 parameters. The cutscore and indifference region are defined on the latent ability (theta) metric, and translated onto the proportion metric for computation. Research on CCT since then has applied this methodology for several reasons:

  1. Large item banks tend to be calibrated with IRT
  2. This allows more accurate specification of the parameters
  3. By using the item response function for each item, the parameters are easily allowed to vary between items.

Detection of anomalous medical outcomes edit

Spiegelhalter et al.[6] have shown that SPRT can be used to monitor the performance of doctors, surgeons and other medical practitioners in such a way as to give early warning of potentially anomalous results. In their 2003 paper, they showed how it could have helped identify Harold Shipman as a murderer well before he was actually identified.

Extensions edit

MaxSPRT edit

More recently, in 2011, an extension of the SPRT method called Maximized Sequential Probability Ratio Test (MaxSPRT)[7] was introduced. The salient feature of MaxSPRT is the allowance of a composite, one-sided alternative hypothesis, and the introduction of an upper stopping boundary. The method has been used in several medical research studies.[8]

See also edit

References edit

  1. ^ Wald, Abraham (June 1945). "Sequential Tests of Statistical Hypotheses". Annals of Mathematical Statistics. 16 (2): 117–186. doi:10.1214/aoms/1177731118. JSTOR 2235829.
  2. ^ Wald, A.; Wolfowitz, J. (1948). "Optimum Character of the Sequential Probability Ratio Test". The Annals of Mathematical Statistics. 19 (3): 326–339. doi:10.1214/aoms/1177730197. JSTOR 2235638.
  3. ^ Ferguson, Richard L. (1969). The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction. Unpublished doctoral dissertation, University of Pittsburgh.
  4. ^ Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237-254). New York: Academic Press.
  5. ^ Eggen, T. J. H. M. (1999). "Item Selection in Adaptive Testing with the Sequential Probability Ratio Test". Applied Psychological Measurement. 23 (3): 249–261. doi:10.1177/01466219922031365. S2CID 120780131.
  6. ^ Risk-adjusted sequential probability ratio tests: application to Bristol, Shipman and adult cardiac surgery Spiegelhalter, D. et al Int J Qual Health Care vol 15 7-13 (2003)[dead link]
  7. ^ Kulldorff, Martin; Davis, Robert L.; Kolczak†, Margarette; Lewis, Edwin; Lieu, Tracy; Platt, Richard (2011). "A Maximized Sequential Probability Ratio Test for Drug and Vaccine Safety Surveillance". Sequential Analysis. 30: 58–78. doi:10.1080/07474946.2011.539924.
  8. ^ 2nd to last paragraph of section 1: http://www.tandfonline.com/doi/full/10.1080/07474946.2011.539924 A Maximized Sequential Probability Ratio Test for Drug and Vaccine Safety Surveillance Kulldorff, M. et al Sequential Analysis: Design Methods and Applications vol 30, issue 1

Further reading edit

External links edit

  • Wald's Sequential Probability Ratio Test for R by Stéphane Bottine
  • Wald's Sequential Probability Ratio Test for Python by Zhenning Yu

sequential, probability, ratio, test, sprt, redirects, here, standard, platinum, resistance, thermometers, resistance, thermometer, sequential, probability, ratio, test, sprt, specific, sequential, hypothesis, test, developed, abraham, wald, later, proven, opt. SPRT redirects here For standard platinum resistance thermometers see resistance thermometer The sequential probability ratio test SPRT is a specific sequential hypothesis test developed by Abraham Wald 1 and later proven to be optimal by Wald and Jacob Wolfowitz 2 Neyman and Pearson s 1933 result inspired Wald to reformulate it as a sequential analysis problem The Neyman Pearson lemma by contrast offers a rule of thumb for when all the data is collected and its likelihood ratio known While originally developed for use in quality control studies in the realm of manufacturing SPRT has been formulated for use in the computerized testing of human examinees as a termination criterion 3 4 5 Contents 1 Theory 2 Example 3 Applications 3 1 Manufacturing 3 2 Testing of human examinees 3 3 Detection of anomalous medical outcomes 4 Extensions 4 1 MaxSPRT 5 See also 6 References 7 Further reading 8 External linksTheory editAs in classical hypothesis testing SPRT starts with a pair of hypotheses say H0 displaystyle H 0 nbsp and H1 displaystyle H 1 nbsp for the null hypothesis and alternative hypothesis respectively They must be specified as follows H0 p p0 displaystyle H 0 p p 0 nbsp H1 p p1 displaystyle H 1 p p 1 nbsp The next step is to calculate the cumulative sum of the log likelihood ratio log Li displaystyle log Lambda i nbsp as new data arrive with S0 0 displaystyle S 0 0 nbsp then for i displaystyle i nbsp 1 2 Si Si 1 log Li displaystyle S i S i 1 log Lambda i nbsp The stopping rule is a simple thresholding scheme a lt Si lt b displaystyle a lt S i lt b nbsp continue monitoring critical inequality Si b displaystyle S i geq b nbsp Accept H1 displaystyle H 1 nbsp Si a displaystyle S i leq a nbsp Accept H0 displaystyle H 0 nbsp where a displaystyle a nbsp and b displaystyle b nbsp a lt 0 lt b lt displaystyle a lt 0 lt b lt infty nbsp depend on the desired type I and type II errors a displaystyle alpha nbsp and b displaystyle beta nbsp They may be chosen as follows a log b1 a displaystyle a approx log frac beta 1 alpha nbsp and b log 1 ba displaystyle b approx log frac 1 beta alpha nbsp In other words a displaystyle alpha nbsp and b displaystyle beta nbsp must be decided beforehand in order to set the thresholds appropriately The numerical value will depend on the application The reason for being only an approximation is that in the discrete case the signal may cross the threshold between samples Thus depending on the penalty of making an error and the sampling frequency one might set the thresholds more aggressively The exact bounds are correct in the continuous case Example editA textbook example is parameter estimation of a probability distribution function Consider the exponential distribution f8 x 8 1e x8 x 8 gt 0 displaystyle f theta x theta 1 e frac x theta qquad x theta gt 0 nbsp The hypotheses are H0 8 80H1 8 8181 gt 80 displaystyle begin cases H 0 theta theta 0 H 1 theta theta 1 end cases qquad theta 1 gt theta 0 nbsp Then the log likelihood function LLF for one sample is log L x log 81 1e x8180 1e x80 log 8081ex80 x81 log 8081 log ex80 x81 log 8180 x80 x81 log 8180 81 808081 x displaystyle begin aligned log Lambda x amp log left frac theta 1 1 e frac x theta 1 theta 0 1 e frac x theta 0 right amp log left frac theta 0 theta 1 e frac x theta 0 frac x theta 1 right amp log left frac theta 0 theta 1 right log left e frac x theta 0 frac x theta 1 right amp log left frac theta 1 theta 0 right left frac x theta 0 frac x theta 1 right amp log left frac theta 1 theta 0 right left frac theta 1 theta 0 theta 0 theta 1 right x end aligned nbsp The cumulative sum of the LLFs for all x is Sn i 1nlog L xi nlog 8180 81 808081 i 1nxi displaystyle S n sum i 1 n log Lambda x i n log left frac theta 1 theta 0 right left frac theta 1 theta 0 theta 0 theta 1 right sum i 1 n x i nbsp Accordingly the stopping rule is a lt nlog 8180 81 808081 i 1nxi lt b displaystyle a lt n log left frac theta 1 theta 0 right left frac theta 1 theta 0 theta 0 theta 1 right sum i 1 n x i lt b nbsp After re arranging we finally find a nlog 8180 lt 81 808081 i 1nxi lt b nlog 8180 displaystyle a n log left frac theta 1 theta 0 right lt left frac theta 1 theta 0 theta 0 theta 1 right sum i 1 n x i lt b n log left frac theta 1 theta 0 right nbsp The thresholds are simply two parallel lines with slope log 81 80 displaystyle log theta 1 theta 0 nbsp Sampling should stop when the sum of the samples makes an excursion outside the continue sampling region Applications editManufacturing edit The test is done on the proportion metric and tests that a variable p is equal to one of two desired points p1 or p2 The region between these two points is known as the indifference region IR For example suppose you are performing a quality control study on a factory lot of widgets Management would like the lot to have 3 or less defective widgets but 1 or less is the ideal lot that would pass with flying colors In this example p1 0 01 and p2 0 03 and the region between them is the IR because management considers these lots to be marginal and is OK with them being classified either way Widgets would be sampled one at a time from the lot sequential analysis until the test determines within an acceptable error level that the lot is ideal or should be rejected Testing of human examinees edit The SPRT is currently the predominant method of classifying examinees in a variable length computerized classification test CCT citation needed The two parameters are p1 and p2 are specified by determining a cutscore threshold for examinees on the proportion correct metric and selecting a point above and below that cutscore For instance suppose the cutscore is set at 70 for a test We could select p1 0 65 and p2 0 75 The test then evaluates the likelihood that an examinee s true score on that metric is equal to one of those two points If the examinee is determined to be at 75 they pass and they fail if they are determined to be at 65 These points are not specified completely arbitrarily A cutscore should always be set with a legally defensible method such as a modified Angoff procedure Again the indifference region represents the region of scores that the test designer is OK with going either way pass or fail The upper parameter p2 is conceptually the highest level that the test designer is willing to accept for a Fail because everyone below it has a good chance of failing and the lower parameter p1 is the lowest level that the test designer is willing to accept for a pass because everyone above it has a decent chance of passing While this definition may seem to be a relatively small burden consider the high stakes case of a licensing test for medical doctors at just what point should we consider somebody to be at one of these two levels While the SPRT was first applied to testing in the days of classical test theory as is applied in the previous paragraph Reckase 1983 suggested that item response theory be used to determine the p1 and p2 parameters The cutscore and indifference region are defined on the latent ability theta metric and translated onto the proportion metric for computation Research on CCT since then has applied this methodology for several reasons Large item banks tend to be calibrated with IRT This allows more accurate specification of the parameters By using the item response function for each item the parameters are easily allowed to vary between items Detection of anomalous medical outcomes edit Spiegelhalter et al 6 have shown that SPRT can be used to monitor the performance of doctors surgeons and other medical practitioners in such a way as to give early warning of potentially anomalous results In their 2003 paper they showed how it could have helped identify Harold Shipman as a murderer well before he was actually identified Extensions editMaxSPRT edit More recently in 2011 an extension of the SPRT method called Maximized Sequential Probability Ratio Test MaxSPRT 7 was introduced The salient feature of MaxSPRT is the allowance of a composite one sided alternative hypothesis and the introduction of an upper stopping boundary The method has been used in several medical research studies 8 See also editCUSUM Computerized classification test Wald test Likelihood ratio testReferences edit Wald Abraham June 1945 Sequential Tests of Statistical Hypotheses Annals of Mathematical Statistics 16 2 117 186 doi 10 1214 aoms 1177731118 JSTOR 2235829 Wald A Wolfowitz J 1948 Optimum Character of the Sequential Probability Ratio Test The Annals of Mathematical Statistics 19 3 326 339 doi 10 1214 aoms 1177730197 JSTOR 2235638 Ferguson Richard L 1969 The development implementation and evaluation of a computer assisted branched test for a program of individually prescribed instruction Unpublished doctoral dissertation University of Pittsburgh Reckase M D 1983 A procedure for decision making using tailored testing In D J Weiss Ed New horizons in testing Latent trait theory and computerized adaptive testing pp 237 254 New York Academic Press Eggen T J H M 1999 Item Selection in Adaptive Testing with the Sequential Probability Ratio Test Applied Psychological Measurement 23 3 249 261 doi 10 1177 01466219922031365 S2CID 120780131 Risk adjusted sequential probability ratio tests application to Bristol Shipman and adult cardiac surgery Spiegelhalter D et al Int J Qual Health Care vol 15 7 13 2003 dead link Kulldorff Martin Davis Robert L Kolczak Margarette Lewis Edwin Lieu Tracy Platt Richard 2011 A Maximized Sequential Probability Ratio Test for Drug and Vaccine Safety Surveillance Sequential Analysis 30 58 78 doi 10 1080 07474946 2011 539924 2nd to last paragraph of section 1 http www tandfonline com doi full 10 1080 07474946 2011 539924 A Maximized Sequential Probability Ratio Test for Drug and Vaccine Safety Surveillance Kulldorff M et al Sequential Analysis Design Methods and Applications vol 30 issue 1Further reading editGhosh Bhaskar Kumar 1970 Sequential Tests of Statistical Hypotheses Reading Addison Wesley Holger Wilker Sequential Statistik in der Praxis BoD Norderstedt 2012 ISBN 978 3848232529 External links editWald s Sequential Probability Ratio Test for R by Stephane Bottine Wald s Sequential Probability Ratio Test for Python by Zhenning Yu Retrieved from https en wikipedia org w index php title Sequential probability ratio test amp oldid 1131494664, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.