fbpx
Wikipedia

Test statistic

Test statistic is a quantity derived from the sample for statistical hypothesis testing.[1] A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.

The above image shows a table with some of the most common test statistics and their corresponding statistical tests or models.

An important property of a test statistic is that its sampling distribution under the null hypothesis must be calculable, either exactly or approximately, which allows p-values to be calculated. A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However, a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution.

Two widely used test statistics are the t-statistic and the F-test.

Example edit

Suppose the task is to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If the coin is flipped 100 times and the results are recorded, the raw data can be represented as a sequence of 100 heads and tails. If there is interest in the marginal probability of obtaining a tail, only the number T out of the 100 flips that produced a tail needs to be recorded. But T can also be used as a test statistic in one of two ways:

  • the exact sampling distribution of T under the null hypothesis is the binomial distribution with parameters 0.5 and 100.
  • the value of T can be compared with its expected value under the null hypothesis of 50, and since the sample size is large, a normal distribution can be used as an approximation to the sampling distribution either for T or for the revised test statistic T−50.

Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed p-value for the null hypothesis that the coin is fair. The test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing.

Common test statistics edit

One-sample tests are appropriate when a sample is being compared to the population from a hypothesis. The population characteristics are known from theory or are calculated from the population.

Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment.

Paired tests are appropriate for comparing two samples where it is impossible to control important variables. Rather than comparing two sets, members are paired between samples so the difference between the members becomes the sample. Typically the mean of the differences is then compared to zero. The common example scenario for when a paired difference test is appropriate is when a single set of test subjects has something applied to them and the test is intended to check for an effect.

Z-tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation.

A t-test is appropriate for comparing means under relaxed conditions (less is assumed).

Tests of proportions are analogous to tests of means (the 50% proportion).

Chi-squared tests use the same calculations and the same probability distribution for different applications:

  • Chi-squared tests for variance are used to determine whether a normal population has a specified variance. The null hypothesis is that it does.
  • Chi-squared tests of independence are used for deciding whether two variables are associated or are independent. The variables are categorical rather than numeric. It can be used to decide whether left-handedness is correlated with height (or not). The null hypothesis is that the variables are independent. The numbers used in the calculation are the observed and expected frequencies of occurrence (from contingency tables).
  • Chi-squared goodness of fit tests are used to determine the adequacy of curves fit to data. The null hypothesis is that the curve fit is adequate. It is common to determine curve shapes to minimize the mean square error, so it is appropriate that the goodness-of-fit calculation sums the squared errors.

F-tests (analysis of variance, ANOVA) are commonly used when deciding whether groupings of data by category are meaningful. If the variance of test scores of the left-handed in a class is much smaller than the variance of the whole class, then it may be useful to study lefties as a group. The null hypothesis is that two variances are the same – so the proposed grouping is not meaningful.

In the table below, the symbols used are defined at the bottom of the table. Many other tests can be found in other articles. Proofs exist that the test statistics are appropriate.[2]

Name Formula Assumptions or notes
One-sample   -test   (Normal population or n large) and σ known.

(z is the distance from the mean in relation to the standard deviation of the mean). For non-normal distributions it is possible to calculate a minimum proportion of a population that falls within k standard deviations for any k (see: Chebyshev's inequality).

Two-sample z-test   Normal population and independent observations and σ1 and σ2 are known where   is the value of   under the null hypothesis
One-sample t-test  
 
(Normal population or n large) and   unknown
Paired t-test  

 

(Normal population of differences or n large) and   unknown
Two-sample pooled t-test, equal variances  

 
 [3]

(Normal populations or n1 + n2 > 40) and independent observations and σ1 = σ2 unknown
Two-sample unpooled t-test, unequal variances (Welch's t-test)  

 [3]

(Normal populations or n1 + n2 > 40) and independent observations and σ1 ≠ σ2 both unknown
One-proportion z-test   n .p0 > 10 and n (1 − p0) > 10 and it is a SRS (Simple Random Sample), see notes.
Two-proportion z-test, pooled for    

 

n1 p1 > 5 and n1(1 − p1) > 5 and n2 p2 > 5 and n2(1 − p2) > 5 and independent observations, see notes.
Two-proportion z-test, unpooled for     n1 p1 > 5 and n1(1 − p1) > 5 and n2 p2 > 5 and n2(1 − p2) > 5 and independent observations, see notes.
Chi-squared test for variance   df = n-1

• Normal population

Chi-squared test for goodness of fit   df = k − 1 − # parameters estimated, and one of these must hold.

• All expected counts are at least 5.[4]

• All expected counts are > 1 and no more than 20% of expected counts are less than 5[5]

Two-sample F test for equality of variances   Normal populations
Arrange so   and reject H0 for  [6]
Regression t-test of     Reject H0 for  [7]
*Subtract 1 for intercept; k terms contain independent variables.
In general, the subscript 0 indicates a value taken from the null hypothesis, H0, which should be used as much as possible in constructing its test statistic. ... Definitions of other symbols:
  •   = sample variance
  •   = sample 1 standard deviation
  •   = sample 2 standard deviation
  •   = t statistic
  •   = degrees of freedom
  •   = sample mean of differences
  •   = hypothesized population mean difference
  •   = standard deviation of differences
  •   = Chi-squared statistic
  •   = sample proportion, unless specified otherwise
  •   = hypothesized population proportion
  •   = proportion 1
  •   = proportion 2
  •   = hypothesized difference in proportion
  •   = minimum of   and  
  •  
  •  
  •   = F statistic

See also edit

References edit

  1. ^ Berger, R. L.; Casella, G. (2001). Statistical Inference, Duxbury Press, Second Edition (p.374)
  2. ^ Loveland, Jennifer L. (2011). Mathematical Justification of Introductory Hypothesis Tests and Development of Reference Materials (M.Sc. (Mathematics)). Utah State University. Retrieved April 30, 2013. Abstract: "The focus was on the Neyman–Pearson approach to hypothesis testing. A brief historical development of the Neyman–Pearson approach is followed by mathematical proofs of each of the hypothesis tests covered in the reference material." The proofs do not reference the concepts introduced by Neyman and Pearson, instead they show that traditional test statistics have the probability distributions ascribed to them, so that significance calculations assuming those distributions are correct. The thesis information is also posted at mathnstats.com as of April 2013.
  3. ^ a b NIST handbook: Two-Sample t-test for Equal Means
  4. ^ Steel, R. G. D., and Torrie, J. H., Principles and Procedures of Statistics with Special Reference to the Biological Sciences., McGraw Hill, 1960, page 350.
  5. ^ Weiss, Neil A. (1999). Introductory Statistics (5th ed.). pp. 802. ISBN 0-201-59877-9.
  6. ^ NIST handbook: F-Test for Equality of Two Standard Deviations (Testing standard deviations the same as testing variances)
  7. ^ Steel, R. G. D., and Torrie, J. H., Principles and Procedures of Statistics with Special Reference to the Biological Sciences., McGraw Hill, 1960, page 288.)

test, statistic, quantity, derived, from, sample, statistical, hypothesis, testing, hypothesis, test, typically, specified, terms, test, statistic, considered, numerical, summary, data, that, reduces, data, value, that, used, perform, hypothesis, test, general. Test statistic is a quantity derived from the sample for statistical hypothesis testing 1 A hypothesis test is typically specified in terms of a test statistic considered as a numerical summary of a data set that reduces the data to one value that can be used to perform the hypothesis test In general a test statistic is selected or defined in such a way as to quantify within observed data behaviours that would distinguish the null from the alternative hypothesis where such an alternative is prescribed or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis The above image shows a table with some of the most common test statistics and their corresponding statistical tests or models An important property of a test statistic is that its sampling distribution under the null hypothesis must be calculable either exactly or approximately which allows p values to be calculated A test statistic shares some of the same qualities of a descriptive statistic and many statistics can be used as both test statistics and descriptive statistics However a test statistic is specifically intended for use in statistical testing whereas the main quality of a descriptive statistic is that it is easily interpretable Some informative descriptive statistics such as the sample range do not make good test statistics since it is difficult to determine their sampling distribution Two widely used test statistics are the t statistic and the F test Contents 1 Example 2 Common test statistics 3 See also 4 ReferencesExample editSuppose the task is to test whether a coin is fair i e has equal probabilities of producing a head or a tail If the coin is flipped 100 times and the results are recorded the raw data can be represented as a sequence of 100 heads and tails If there is interest in the marginal probability of obtaining a tail only the number T out of the 100 flips that produced a tail needs to be recorded But T can also be used as a test statistic in one of two ways the exact sampling distribution of T under the null hypothesis is the binomial distribution with parameters 0 5 and 100 the value of T can be compared with its expected value under the null hypothesis of 50 and since the sample size is large a normal distribution can be used as an approximation to the sampling distribution either for T or for the revised test statistic T 50 Using one of these sampling distributions it is possible to compute either a one tailed or two tailed p value for the null hypothesis that the coin is fair The test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing Common test statistics editOne sample tests are appropriate when a sample is being compared to the population from a hypothesis The population characteristics are known from theory or are calculated from the population Two sample tests are appropriate for comparing two samples typically experimental and control samples from a scientifically controlled experiment Paired tests are appropriate for comparing two samples where it is impossible to control important variables Rather than comparing two sets members are paired between samples so the difference between the members becomes the sample Typically the mean of the differences is then compared to zero The common example scenario for when a paired difference test is appropriate is when a single set of test subjects has something applied to them and the test is intended to check for an effect Z tests are appropriate for comparing means under stringent conditions regarding normality and a known standard deviation A t test is appropriate for comparing means under relaxed conditions less is assumed Tests of proportions are analogous to tests of means the 50 proportion Chi squared tests use the same calculations and the same probability distribution for different applications Chi squared tests for variance are used to determine whether a normal population has a specified variance The null hypothesis is that it does Chi squared tests of independence are used for deciding whether two variables are associated or are independent The variables are categorical rather than numeric It can be used to decide whether left handedness is correlated with height or not The null hypothesis is that the variables are independent The numbers used in the calculation are the observed and expected frequencies of occurrence from contingency tables Chi squared goodness of fit tests are used to determine the adequacy of curves fit to data The null hypothesis is that the curve fit is adequate It is common to determine curve shapes to minimize the mean square error so it is appropriate that the goodness of fit calculation sums the squared errors F tests analysis of variance ANOVA are commonly used when deciding whether groupings of data by category are meaningful If the variance of test scores of the left handed in a class is much smaller than the variance of the whole class then it may be useful to study lefties as a group The null hypothesis is that two variances are the same so the proposed grouping is not meaningful In the table below the symbols used are defined at the bottom of the table Many other tests can be found in other articles Proofs exist that the test statistics are appropriate 2 Name Formula Assumptions or notesOne sample z displaystyle z nbsp test z x m0 s n displaystyle z frac overline x mu 0 sigma sqrt n nbsp Normal population or n large and s known z is the distance from the mean in relation to the standard deviation of the mean For non normal distributions it is possible to calculate a minimum proportion of a population that falls within k standard deviations for any k see Chebyshev s inequality Two sample z test z x 1 x 2 d0s12n1 s22n2 displaystyle z frac overline x 1 overline x 2 d 0 sqrt frac sigma 1 2 n 1 frac sigma 2 2 n 2 nbsp Normal population and independent observations and s1 and s2 are known where d0 displaystyle d 0 nbsp is the value of m1 m2 displaystyle mu 1 mu 2 nbsp under the null hypothesisOne sample t test t x m0 s n displaystyle t frac overline x mu 0 s sqrt n nbsp df n 1 displaystyle df n 1 nbsp Normal population or n large and s displaystyle sigma nbsp unknownPaired t test t d d0 sd n displaystyle t frac overline d d 0 s d sqrt n nbsp df n 1 displaystyle df n 1 nbsp Normal population of differences or n large and s displaystyle sigma nbsp unknownTwo sample pooled t test equal variances t x 1 x 2 d0sp1n1 1n2 displaystyle t frac overline x 1 overline x 2 d 0 s p sqrt frac 1 n 1 frac 1 n 2 nbsp sp2 n1 1 s12 n2 1 s22n1 n2 2 displaystyle s p 2 frac n 1 1 s 1 2 n 2 1 s 2 2 n 1 n 2 2 nbsp df n1 n2 2 displaystyle df n 1 n 2 2 nbsp 3 Normal populations or n1 n2 gt 40 and independent observations and s1 s2 unknownTwo sample unpooled t test unequal variances Welch s t test t x 1 x 2 d0s12n1 s22n2 displaystyle t frac overline x 1 overline x 2 d 0 sqrt frac s 1 2 n 1 frac s 2 2 n 2 nbsp df s12n1 s22n2 2 s12n1 2n1 1 s22n2 2n2 1 displaystyle df frac left dfrac s 1 2 n 1 dfrac s 2 2 n 2 right 2 dfrac left dfrac s 1 2 n 1 right 2 n 1 1 dfrac left dfrac s 2 2 n 2 right 2 n 2 1 nbsp 3 Normal populations or n1 n2 gt 40 and independent observations and s1 s2 both unknownOne proportion z test z p p0p0 1 p0 n displaystyle z frac hat p p 0 sqrt p 0 1 p 0 sqrt n nbsp n p0 gt 10 and n 1 p0 gt 10 and it is a SRS Simple Random Sample see notes Two proportion z test pooled for H0 p1 p2 displaystyle H 0 colon p 1 p 2 nbsp z p 1 p 2 p 1 p 1n1 1n2 displaystyle z frac hat p 1 hat p 2 sqrt hat p 1 hat p frac 1 n 1 frac 1 n 2 nbsp p x1 x2n1 n2 displaystyle hat p frac x 1 x 2 n 1 n 2 nbsp n1 p1 gt 5 and n1 1 p1 gt 5 and n2 p2 gt 5 and n2 1 p2 gt 5 and independent observations see notes Two proportion z test unpooled for d0 gt 0 displaystyle d 0 gt 0 nbsp z p 1 p 2 d0p 1 1 p 1 n1 p 2 1 p 2 n2 displaystyle z frac hat p 1 hat p 2 d 0 sqrt frac hat p 1 1 hat p 1 n 1 frac hat p 2 1 hat p 2 n 2 nbsp n1 p1 gt 5 and n1 1 p1 gt 5 and n2 p2 gt 5 and n2 1 p2 gt 5 and independent observations see notes Chi squared test for variance x2 n 1 s2s02 displaystyle chi 2 n 1 frac s 2 sigma 0 2 nbsp df n 1 Normal populationChi squared test for goodness of fit x2 k observed expected 2expected displaystyle chi 2 sum k frac text observed text expected 2 text expected nbsp df k 1 parameters estimated and one of these must hold All expected counts are at least 5 4 All expected counts are gt 1 and no more than 20 of expected counts are less than 5 5 Two sample F test for equality of variances F s12s22 displaystyle F frac s 1 2 s 2 2 nbsp Normal populationsArrange so s12 s22 displaystyle s 1 2 geq s 2 2 nbsp and reject H0 for F gt F a 2 n1 1 n2 1 displaystyle F gt F alpha 2 n 1 1 n 2 1 nbsp 6 Regression t test of H0 R2 0 displaystyle H 0 colon R 2 0 nbsp t R2 n k 1 1 R2 displaystyle t sqrt frac R 2 n k 1 1 R 2 nbsp Reject H0 for t gt t a 2 n k 1 displaystyle t gt t alpha 2 n k 1 nbsp 7 Subtract 1 for intercept k terms contain independent variables In general the subscript 0 indicates a value taken from the null hypothesis H0 which should be used as much as possible in constructing its test statistic Definitions of other symbols a displaystyle alpha nbsp the probability of Type I error rejecting a null hypothesis when it is in fact true n displaystyle n nbsp sample size n1 displaystyle n 1 nbsp sample 1 size n2 displaystyle n 2 nbsp sample 2 size x displaystyle overline x nbsp sample mean m0 displaystyle mu 0 nbsp hypothesized population mean m1 displaystyle mu 1 nbsp population 1 mean m2 displaystyle mu 2 nbsp population 2 mean s displaystyle sigma nbsp population standard deviation s2 displaystyle sigma 2 nbsp population variance s displaystyle s nbsp sample standard deviation k displaystyle sum k nbsp sum of k textstyle k nbsp numbers s2 displaystyle s 2 nbsp sample variance s1 displaystyle s 1 nbsp sample 1 standard deviation s2 displaystyle s 2 nbsp sample 2 standard deviation t displaystyle t nbsp t statistic df displaystyle df nbsp degrees of freedom d displaystyle overline d nbsp sample mean of differences d0 displaystyle d 0 nbsp hypothesized population mean difference sd displaystyle s d nbsp standard deviation of differences x2 displaystyle chi 2 nbsp Chi squared statistic p xn displaystyle hat p frac x n nbsp sample proportion unless specified otherwise p0 displaystyle p 0 nbsp hypothesized population proportion p1 displaystyle p 1 nbsp proportion 1 p2 displaystyle p 2 nbsp proportion 2 dp displaystyle d p nbsp hypothesized difference in proportion min n1 n2 displaystyle min n 1 n 2 nbsp minimum of n1 textstyle n 1 nbsp and n2 textstyle n 2 nbsp x1 n1p1 displaystyle x 1 n 1 p 1 nbsp x2 n2p2 displaystyle x 2 n 2 p 2 nbsp F displaystyle F nbsp F statisticSee also editNull distribution Likelihood ratio test Neyman Pearson lemma R2 displaystyle R 2 nbsp coefficient of determination Sufficiency statistics References edit Berger R L Casella G 2001 Statistical Inference Duxbury Press Second Edition p 374 Loveland Jennifer L 2011 Mathematical Justification of Introductory Hypothesis Tests and Development of Reference Materials M Sc Mathematics Utah State University Retrieved April 30 2013 Abstract The focus was on the Neyman Pearson approach to hypothesis testing A brief historical development of the Neyman Pearson approach is followed by mathematical proofs of each of the hypothesis tests covered in the reference material The proofs do not reference the concepts introduced by Neyman and Pearson instead they show that traditional test statistics have the probability distributions ascribed to them so that significance calculations assuming those distributions are correct The thesis information is also posted at mathnstats com as of April 2013 a b NIST handbook Two Sample t test for Equal Means Steel R G D and Torrie J H Principles and Procedures of Statistics with Special Reference to the Biological Sciences McGraw Hill 1960 page 350 Weiss Neil A 1999 Introductory Statistics 5th ed pp 802 ISBN 0 201 59877 9 NIST handbook F Test for Equality of Two Standard Deviations Testing standard deviations the same as testing variances Steel R G D and Torrie J H Principles and Procedures of Statistics with Special Reference to the Biological Sciences McGraw Hill 1960 page 288 Retrieved from https en wikipedia org w index php title Test statistic amp oldid 1206540045 Common test statistics, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.