fbpx
Wikipedia

Ancillary statistic

An ancillary statistic is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the parameters of the model.[1][2][3] An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals. They are also used in connection with Basu's theorem to prove independence between statistics.[4]

This concept was first introduced by Ronald Fisher in the 1920s,[5] but its formal definition was only provided in 1964 by Debabrata Basu.[6][7]

Examples edit

Suppose X1, ..., Xn are independent and identically distributed, and are normally distributed with unknown expected value μ and known variance 1. Let

 

be the sample mean.

The following statistical measures of dispersion of the sample

 

are all ancillary statistics, because their sampling distributions do not change as μ changes. Computationally, this is because in the formulas, the μ terms cancel – adding a constant number to a distribution (and all samples) changes its sample maximum and minimum by the same amount, so it does not change their difference, and likewise for others: these measures of dispersion do not depend on location.

Conversely, given i.i.d. normal variables with known mean 1 and unknown variance σ2, the sample mean   is not an ancillary statistic of the variance, as the sampling distribution of the sample mean is N(1, σ2/n), which does depend on σ 2 – this measure of location (specifically, its standard error) depends on dispersion.[8]

In location-scale families edit

In a location family of distributions,   is an ancillary statistic.

In a scale family of distributions,   is an ancillary statistic.

In a location-scale family of distributions,  , where   is the sample variance, is an ancillary statistic.[3][9]

In recovery of information edit

It turns out that, if   is a non-sufficient statistic and   is ancillary, one can sometimes recover all the information about the unknown parameter contained in the entire data by reporting   while conditioning on the observed value of  . This is known as conditional inference.[3]

For example, suppose that   follow the   distribution where   is unknown. Note that, even though   is not sufficient for   (since its Fisher information is 1, whereas the Fisher information of the complete statistic   is 2), by additionally reporting the ancillary statistic  , one obtains a joint distribution with Fisher information 2.[3]

Ancillary complement edit

Given a statistic T that is not sufficient, an ancillary complement is a statistic U that is ancillary and such that (TU) is sufficient.[2] Intuitively, an ancillary complement "adds the missing information" (without duplicating any).

The statistic is particularly useful if one takes T to be a maximum likelihood estimator, which in general will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider the Fisher information content of T to not be the marginal of T, but the conditional distribution of T, given U: how much information does T add? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.

Example edit

In baseball, suppose a scout observes a batter in N at-bats. Suppose (unrealistically) that the number N is chosen by some random process that is independent of the batter's ability – say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number N of at-bats and the number X of hits: the data (XN) are a sufficient statistic. The observed batting average X/N fails to convey all of the information available in the data because it fails to report the number N of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number N of at-bats is an ancillary statistic because

  • It is a part of the observable data (it is a statistic), and
  • Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.

This ancillary statistic is an ancillary complement to the observed batting average X/N, i.e., the batting average X/N is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with N, it becomes sufficient.

See also edit

Notes edit

  1. ^ Lehmann, E. L.; Scholz, F. W. (1992). "Ancillarity". Lecture Notes-Monograph Series. Institute of Mathematical Statistics Lecture Notes - Monograph Series. 17: 32–51. doi:10.1214/lnms/1215458837. ISBN 0-940600-24-2. ISSN 0749-2170. JSTOR 4355624.
  2. ^ a b Ghosh, M.; Reid, N.; Fraser, D. A. S. (2010). "Ancillary statistics: A review". Statistica Sinica. 20 (4): 1309–1332. ISSN 1017-0405. JSTOR 24309506.
  3. ^ a b c d Mukhopadhyay, Nitis (2000). Probability and Statistical Inference. United States of America: Marcel Dekker, Inc. pp. 309–318. ISBN 0-8247-0379-0.
  4. ^ Dawid, Philip (2011), DasGupta, Anirban (ed.), "Basu on Ancillarity", Selected Works of Debabrata Basu, New York, NY: Springer, pp. 5–8, doi:10.1007/978-1-4419-5825-9_2, ISBN 978-1-4419-5825-9
  5. ^ Fisher, R. A. (1925). "Theory of Statistical Estimation". Mathematical Proceedings of the Cambridge Philosophical Society. 22 (5): 700–725. Bibcode:1925PCPS...22..700F. doi:10.1017/S0305004100009580. hdl:2440/15186. ISSN 0305-0041.
  6. ^ Basu, D. (1964). "Recovery of Ancillary Information". Sankhyā: The Indian Journal of Statistics, Series A (1961-2002). 26 (1): 3–16. ISSN 0581-572X. JSTOR 25049300.
  7. ^ Stigler, Stephen M. (2001), Ancillary history, Institute of Mathematical Statistics Lecture Notes - Monograph Series, Beachwood, OH: Institute of Mathematical Statistics, pp. 555–567, doi:10.1214/lnms/1215090089, ISBN 978-0-940600-50-8, retrieved 2023-04-24
  8. ^ Buehler, Robert J. (1982). "Some Ancillary Statistics and Their Properties". Journal of the American Statistical Association. 77 (379): 581–589. doi:10.1080/01621459.1982.10477850. hdl:11299/199392. ISSN 0162-1459.
  9. ^ "Ancillary statistics" (PDF).

ancillary, statistic, ancillary, statistic, measure, sample, whose, distribution, whose, does, depend, parameters, model, ancillary, statistic, pivotal, quantity, that, also, statistic, used, construct, prediction, intervals, they, also, used, connection, with. An ancillary statistic is a measure of a sample whose distribution or whose pmf or pdf does not depend on the parameters of the model 1 2 3 An ancillary statistic is a pivotal quantity that is also a statistic Ancillary statistics can be used to construct prediction intervals They are also used in connection with Basu s theorem to prove independence between statistics 4 This concept was first introduced by Ronald Fisher in the 1920s 5 but its formal definition was only provided in 1964 by Debabrata Basu 6 7 Contents 1 Examples 1 1 In location scale families 2 In recovery of information 3 Ancillary complement 3 1 Example 4 See also 5 NotesExamples editSuppose X1 Xn are independent and identically distributed and are normally distributed with unknown expected value m and known variance 1 Let X n X 1 X n n displaystyle overline X n frac X 1 cdots X n n nbsp be the sample mean The following statistical measures of dispersion of the sample Range max X1 Xn min X1 Xn Interquartile range Q3 Q1 Sample variance s 2 X i X 2 n displaystyle hat sigma 2 frac sum left X i overline X right 2 n nbsp dd are all ancillary statistics because their sampling distributions do not change as m changes Computationally this is because in the formulas the m terms cancel adding a constant number to a distribution and all samples changes its sample maximum and minimum by the same amount so it does not change their difference and likewise for others these measures of dispersion do not depend on location Conversely given i i d normal variables with known mean 1 and unknown variance s2 the sample mean X displaystyle overline X nbsp is not an ancillary statistic of the variance as the sampling distribution of the sample mean is N 1 s2 n which does depend on s 2 this measure of location specifically its standard error depends on dispersion 8 In location scale families edit In a location family of distributions X 1 X n X 2 X n X n 1 X n displaystyle X 1 X n X 2 X n dots X n 1 X n nbsp is an ancillary statistic In a scale family of distributions X 1 X n X 2 X n X n 1 X n displaystyle frac X 1 X n frac X 2 X n dots frac X n 1 X n nbsp is an ancillary statistic In a location scale family of distributions X 1 X n S X 2 X n S X n 1 X n S displaystyle frac X 1 X n S frac X 2 X n S dots frac X n 1 X n S nbsp where S 2 displaystyle S 2 nbsp is the sample variance is an ancillary statistic 3 9 In recovery of information editIt turns out that if T 1 displaystyle T 1 nbsp is a non sufficient statistic and T 2 displaystyle T 2 nbsp is ancillary one can sometimes recover all the information about the unknown parameter contained in the entire data by reporting T 1 displaystyle T 1 nbsp while conditioning on the observed value of T 2 displaystyle T 2 nbsp This is known as conditional inference 3 For example suppose that X 1 X 2 displaystyle X 1 X 2 nbsp follow the N 8 1 displaystyle N theta 1 nbsp distribution where 8 displaystyle theta nbsp is unknown Note that even though X 1 displaystyle X 1 nbsp is not sufficient for 8 displaystyle theta nbsp since its Fisher information is 1 whereas the Fisher information of the complete statistic X displaystyle overline X nbsp is 2 by additionally reporting the ancillary statistic X 1 X 2 displaystyle X 1 X 2 nbsp one obtains a joint distribution with Fisher information 2 3 Ancillary complement editGiven a statistic T that is not sufficient an ancillary complement is a statistic U that is ancillary and such that T U is sufficient 2 Intuitively an ancillary complement adds the missing information without duplicating any The statistic is particularly useful if one takes T to be a maximum likelihood estimator which in general will not be sufficient then one can ask for an ancillary complement In this case Fisher argues that one must condition on an ancillary complement to determine information content one should consider the Fisher information content of T to not be the marginal of T but the conditional distribution of T given U how much information does T add This is not possible in general as no ancillary complement need exist and if one exists it need not be unique nor does a maximum ancillary complement exist Example edit In baseball suppose a scout observes a batter in N at bats Suppose unrealistically that the number N is chosen by some random process that is independent of the batter s ability say a coin is tossed after each at bat and the result determines whether the scout will stay to watch the batter s next at bat The eventual data are the number N of at bats and the number X of hits the data X N are a sufficient statistic The observed batting average X N fails to convey all of the information available in the data because it fails to report the number N of at bats e g a batting average of 0 400 which is very high based on only five at bats does not inspire anywhere near as much confidence in the player s ability than a 0 400 average based on 100 at bats The number N of at bats is an ancillary statistic because It is a part of the observable data it is a statistic and Its probability distribution does not depend on the batter s ability since it was chosen by a random process independent of the batter s ability This ancillary statistic is an ancillary complement to the observed batting average X N i e the batting average X N is not a sufficient statistic in that it conveys less than all of the relevant information in the data but conjoined with N it becomes sufficient See also editBasu s theorem Prediction interval Group family Conditionality principleNotes edit Lehmann E L Scholz F W 1992 Ancillarity Lecture Notes Monograph Series Institute of Mathematical Statistics Lecture Notes Monograph Series 17 32 51 doi 10 1214 lnms 1215458837 ISBN 0 940600 24 2 ISSN 0749 2170 JSTOR 4355624 a b Ghosh M Reid N Fraser D A S 2010 Ancillary statistics A review Statistica Sinica 20 4 1309 1332 ISSN 1017 0405 JSTOR 24309506 a b c d Mukhopadhyay Nitis 2000 Probability and Statistical Inference United States of America Marcel Dekker Inc pp 309 318 ISBN 0 8247 0379 0 Dawid Philip 2011 DasGupta Anirban ed Basu on Ancillarity Selected Works of Debabrata Basu New York NY Springer pp 5 8 doi 10 1007 978 1 4419 5825 9 2 ISBN 978 1 4419 5825 9 Fisher R A 1925 Theory of Statistical Estimation Mathematical Proceedings of the Cambridge Philosophical Society 22 5 700 725 Bibcode 1925PCPS 22 700F doi 10 1017 S0305004100009580 hdl 2440 15186 ISSN 0305 0041 Basu D 1964 Recovery of Ancillary Information Sankhya The Indian Journal of Statistics Series A 1961 2002 26 1 3 16 ISSN 0581 572X JSTOR 25049300 Stigler Stephen M 2001 Ancillary history Institute of Mathematical Statistics Lecture Notes Monograph Series Beachwood OH Institute of Mathematical Statistics pp 555 567 doi 10 1214 lnms 1215090089 ISBN 978 0 940600 50 8 retrieved 2023 04 24 Buehler Robert J 1982 Some Ancillary Statistics and Their Properties Journal of the American Statistical Association 77 379 581 589 doi 10 1080 01621459 1982 10477850 hdl 11299 199392 ISSN 0162 1459 Ancillary statistics PDF Retrieved from https en wikipedia org w index php title Ancillary statistic amp oldid 1182539001, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.