fbpx
Wikipedia

Quartile

In statistics, a quartile is a type of quantile which divides the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows:

  • The first quartile (Q1) is defined as the middle number between the smallest number (minimum) and the median of the data set. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point.
  • The second quartile (Q2) is the median of a data set; thus 50% of the data lies below this point.
  • The third quartile (Q3) is the middle value between the median and the highest value (maximum) of the data set. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point.[1]

Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a five-number summary of the data. This summary is important in statistics because it provides information about both the center and the spread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is skewed toward one side. Since quartiles divide the number of data points evenly, the range is not the same between quartiles (i.e., Q3-Q2Q2-Q1) and is instead known as the interquartile range (IQR). While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of outliers in the data, and the difference in spread between the middle 50% of the data and the outer data points.[2]

Definitions

 
Boxplot (with quartiles and an interquartile range) and a probability density function (pdf) of a normal N(0,1σ2) population
Symbol Names Definition
Q1
splits off the lowest 25% of data from the highest 75%
Q2
  • second quartile
  • median
  • 50th percentile
cuts data set in half
Q3
  • third quartile
  • upper quartile
  • 75th percentile
splits off the highest 25% of data from the lowest 75%

Computing methods

Discrete distributions

For discrete distributions, there is no universal agreement on selecting the quartile values.[3]

Method 1

  1. Use the median to divide the ordered data set into two-halves.
    • If there is an odd number of data points in the original ordered data set, do not include the median (the central value in the ordered list) in either half.
    • If there is an even number of data points in the original ordered data set, split this data set exactly in half.
  2. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.

Method 2

  1. Use the median to divide the ordered data set into two-halves.
    • If there are an odd number of data points in the original ordered data set, include the median (the central value in the ordered list) in both halves.
    • If there are an even number of data points in the original ordered data set, split this data set exactly in half.
  2. The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

The values found by this method are also known as "Tukey's hinges";[4] see also midhinge.

Method 3

  1. If there are even numbers of data points, then Method 3 starts off the same as Method 1 or Method 2 above and you can choose to include or not include the median as a datapoint. If you choose to include the median as a new datapoint, proceed to step 2 or 3 of Method 3 because you now have an odd number of datapoints.
  2. If there are (4n+1) data points, then the lower quartile is 25% of the nth data value plus 75% of the (n+1)th data value; the upper quartile is 75% of the (3n+1)th data point plus 25% of the (3n+2)th data point.
  3. If there are (4n+3) data points, then the lower quartile is 75% of the (n+1)th data value plus 25% of the (n+2)th data value; the upper quartile is 25% of the (3n+2)th data point plus 75% of the (3n+3)th data point.

Method 4

If we have an ordered dataset  , we can interpolate between data points to find the  th empirical quantile if   is in the   quantile. If we denote the integer part of a number   by  , then the empirical quantile function is given by,

 ,

where   and  .[1]

To find the first, second, and third quartiles of the dataset we would evaluate  ,  , and   respectively.

Example 1

Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49

Method 1 Method 2 Method 3 Method 4
Q1 15 25.5 20.25 15
Q2 40 40 40 40
Q3 43 42.5 42.75 43

Example 2

Ordered Data Set: 7, 15, 36, 39, 40, 41

As there are an even number of data points, the first three methods all give the same results.

Method 1 Method 2 Method 3 Method 4
Q1 15 15 15 13
Q2 37.5 37.5 37.5 37.5
Q3 40 40 40 40.25

Continuous probability distributions

 
Quartiles on a cumulative distribution function of a normal distribution

If we define a continuous probability distributions as   where   is a real valued random variable, its cumulative distribution function (CDF) is given by,

 .[1]

The CDF gives the probability that the random variable   is less than the value  . Therefore, the first quartile is the value of   when  , the second quartile is   when  , and the third quartile is   when  .[5] The values of   can be found with the quantile function   where   for the first quartile,   for the second quartile, and   for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is monotonically increasing.

Outliers

There are methods by which to check for outliers in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest.[6] Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea of descriptive statistics, when encountering an outlier, we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. In the case of quartiles, the Interquartile Range (IQR) may be used to characterize the data when there may be extremities that skew the data; the interquartile range is a relatively robust statistic (also sometimes called "resistance") compared to the range and standard deviation. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers.

After determining the first and third quartiles and the interquartile range as outlined above, then fences are calculated using the following formula:

 
 
 
Boxplot Diagram with Outliers

where Q1 and Q3 are the first and third quartiles, respectively. The lower fence is the "lower limit" and the upper fence is the "upper limit" of data, and any data lying outside these defined bounds can be considered an outlier. Anything below the Lower fence or above the Upper fence can be considered such a case. The fences provide a guideline by which to define an outlier, which may be defined in other ways. The fences define a "range" outside which an outlier exists; a way to picture this is a boundary of a fence, outside which are "outsiders" as opposed to outliers. It is common for the lower and upper fences along with the outliers to be represented by a boxplot. For a boxplot, only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant. Outliers located outside the fences in a boxplot can be marked as any choice of symbol, such as an "x" or "o". The fences are sometimes also referred to as "whiskers" while the entire plot visual is called a "box-and-whisker" plot.

When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be simple to mistakenly view it as evidence that the population is non-normal or that the sample is contaminated. However, this method should not take place of a hypothesis test for determining normality of the population. The significance of the outliers vary depending on the sample size. If the sample is small, then it is more probable to get interquartile ranges that are unrepresentatively small, leading to narrower fences. Therefore, it would be more likely to find data that are marked as outliers.[7]

Computer software for quartiles

Environment Function Quartile Method
Microsoft Excel QUARTILE.EXC Method 4
Microsoft Excel QUARTILE.INC Method 3
TI-8X series calculators 1-Var Stats Method 1
R fivenum Method 2
Python numpy.percentile Method 3
Python pandas.DataFrame.describe Method 3

Excel:

The Excel function QUARTILE(array, quart) provides the desired quartile value for a given array of data, using Method 3 from above. In the Quartile function, array is the dataset of numbers that is being analyzed and quart is any of the following 5 values depending on which quartile is being calculated. [8]

Quart Output QUARTILE Value
0 Minimum value
1 Lower Quartile (25th percentile)
2 Median
3 Upper Quartile (75th percentile)
4 Maximum value

MATLAB:

In order to calculate quartiles in Matlab, the function quantile(A,p) can be used. Where A is the vector of data being analyzed and p is the percentage that relates to the quartiles as stated below. [9]

p Output QUARTILE Value
0 Minimum value
0.25 Lower Quartile (25th percentile)
0.5 Median
0.75 Upper Quartile (75th percentile)
1 Maximum value

See also

References

  1. ^ a b c A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946–. London: Springer. 2005. pp. 234–238. ISBN 978-1-85233-896-1. OCLC 262680588.{{cite book}}: CS1 maint: others (link)
  2. ^ Knoch, Jessica (February 23, 2018). "How are Quartiles Used in Statistics?". Magoosh Statistics Blog. Retrieved December 11, 2019.
  3. ^ Hyndman, Rob J; Fan, Yanan (November 1996). "Sample quantiles in statistical packages". American Statistician. 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934.
  4. ^ Tukey, John Wilder (1977). Exploratory Data Analysis. ISBN 978-0-201-07616-5.
  5. ^ "6. Distribution and Quantile Functions" (PDF). math.bme.hu.
  6. ^ Walfish, Steven (November 2006). "A Review of Statistical Outlier Method". Pharmaceutical Technology.
  7. ^ Dawson, Robert (July 1, 2011). "How Significant is a Boxplot Outlier?". Journal of Statistics Education. 19 (2): null. doi:10.1080/10691898.2011.11889610.
  8. ^ "How to use the Excel QUARTILE function | Exceljet". exceljet.net. Retrieved December 11, 2019.
  9. ^ "Quantiles of a data set – MATLAB quantile". www.mathworks.com. Retrieved December 11, 2019.

External links

  • Quartile – from MathWorld Includes references and compares various methods to compute quartiles
  • Quartiles – From MathForum.org
  • Quartiles calculator – simple quartiles calculator
  • Quartiles – An example how to calculate it

quartile, statistics, quartile, type, quantile, which, divides, number, data, points, into, four, parts, quarters, more, less, equal, size, data, must, ordered, from, smallest, largest, compute, quartiles, such, quartiles, form, order, statistic, three, main, . In statistics a quartile is a type of quantile which divides the number of data points into four parts or quarters of more or less equal size The data must be ordered from smallest to largest to compute quartiles as such quartiles are a form of order statistic The three main quartiles are as follows The first quartile Q1 is defined as the middle number between the smallest number minimum and the median of the data set It is also known as the lower or 25th empirical quartile as 25 of the data is below this point The second quartile Q2 is the median of a data set thus 50 of the data lies below this point The third quartile Q3 is the middle value between the median and the highest value maximum of the data set It is known as the upper or 75th empirical quartile as 75 of the data lies below this point 1 Along with the minimum and maximum of the data which are also quartiles the three quartiles described above provide a five number summary of the data This summary is important in statistics because it provides information about both the center and the spread of the data Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is skewed toward one side Since quartiles divide the number of data points evenly the range is not the same between quartiles i e Q3 Q2 Q2 Q1 and is instead known as the interquartile range IQR While the maximum and minimum also show the spread of the data the upper and lower quartiles can provide more detailed information on the location of specific data points the presence of outliers in the data and the difference in spread between the middle 50 of the data and the outer data points 2 Contents 1 Definitions 2 Computing methods 2 1 Discrete distributions 2 1 1 Method 1 2 1 2 Method 2 2 1 3 Method 3 2 1 4 Method 4 2 1 5 Example 1 2 1 6 Example 2 2 2 Continuous probability distributions 3 Outliers 4 Computer software for quartiles 5 See also 6 References 7 External linksDefinitions Edit Boxplot with quartiles and an interquartile range and a probability density function pdf of a normal N 0 1s2 population Symbol Names DefinitionQ1 first quartile lower quartile 25th percentile splits off the lowest 25 of data from the highest 75 Q2 second quartile median 50th percentile cuts data set in halfQ3 third quartile upper quartile 75th percentile splits off the highest 25 of data from the lowest 75 Computing methods EditDiscrete distributions Edit For discrete distributions there is no universal agreement on selecting the quartile values 3 Method 1 Edit Use the median to divide the ordered data set into two halves If there is an odd number of data points in the original ordered data set do not include the median the central value in the ordered list in either half If there is an even number of data points in the original ordered data set split this data set exactly in half The lower quartile value is the median of the lower half of the data The upper quartile value is the median of the upper half of the data This rule is employed by the TI 83 calculator boxplot and 1 Var Stats functions Method 2 Edit Use the median to divide the ordered data set into two halves If there are an odd number of data points in the original ordered data set include the median the central value in the ordered list in both halves If there are an even number of data points in the original ordered data set split this data set exactly in half The lower quartile value is the median of the lower half of the data The upper quartile value is the median of the upper half of the data The values found by this method are also known as Tukey s hinges 4 see also midhinge Method 3 Edit If there are even numbers of data points then Method 3 starts off the same as Method 1 or Method 2 above and you can choose to include or not include the median as a datapoint If you choose to include the median as a new datapoint proceed to step 2 or 3 of Method 3 because you now have an odd number of datapoints If there are 4n 1 data points then the lower quartile is 25 of the nth data value plus 75 of the n 1 th data value the upper quartile is 75 of the 3n 1 th data point plus 25 of the 3n 2 th data point If there are 4n 3 data points then the lower quartile is 75 of the n 1 th data value plus 25 of the n 2 th data value the upper quartile is 25 of the 3n 2 th data point plus 75 of the 3n 3 th data point Method 4 Edit If we have an ordered dataset x 1 x 2 x n displaystyle x 1 x 2 x n we can interpolate between data points to find the p displaystyle p th empirical quantile if x i displaystyle x i is in the i n 1 displaystyle i n 1 quantile If we denote the integer part of a number a displaystyle a by a displaystyle lfloor a rfloor then the empirical quantile function is given by q p 4 x k a x k 1 x k displaystyle q p 4 x k alpha x k 1 x k where k p n 1 4 displaystyle k lfloor p n 1 4 rfloor and a p n 1 4 p n 1 4 displaystyle alpha p n 1 4 lfloor p n 1 4 rfloor 1 To find the first second and third quartiles of the dataset we would evaluate q 0 25 displaystyle q 0 25 q 0 5 displaystyle q 0 5 and q 0 75 displaystyle q 0 75 respectively Example 1 Edit Ordered Data Set 6 7 15 36 39 40 41 42 43 47 49 Method 1 Method 2 Method 3 Method 4Q1 15 25 5 20 25 15Q2 40 40 40 40Q3 43 42 5 42 75 43Example 2 Edit Ordered Data Set 7 15 36 39 40 41As there are an even number of data points the first three methods all give the same results Method 1 Method 2 Method 3 Method 4Q1 15 15 15 13Q2 37 5 37 5 37 5 37 5Q3 40 40 40 40 25Continuous probability distributions Edit Quartiles on a cumulative distribution function of a normal distribution If we define a continuous probability distributions as P X displaystyle P X where X displaystyle X is a real valued random variable its cumulative distribution function CDF is given by F X x P X x displaystyle F X x P X leq x 1 The CDF gives the probability that the random variable X displaystyle X is less than the value x displaystyle x Therefore the first quartile is the value of x displaystyle x when F X x 0 25 displaystyle F X x 0 25 the second quartile is x displaystyle x when F X x 0 5 displaystyle F X x 0 5 and the third quartile is x displaystyle x when F X x 0 75 displaystyle F X x 0 75 5 The values of x displaystyle x can be found with the quantile function Q p displaystyle Q p where p 0 25 displaystyle p 0 25 for the first quartile p 0 5 displaystyle p 0 5 for the second quartile and p 0 75 displaystyle p 0 75 for the third quartile The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is monotonically increasing Outliers EditThere are methods by which to check for outliers in the discipline of statistics and statistical analysis Outliers could be a result from a shift in the location mean or in the scale variability of the process of interest 6 Outliers could also be evidence of a sample population that has a non normal distribution or of a contaminated population data set Consequently as is the basic idea of descriptive statistics when encountering an outlier we have to explain this value by further analysis of the cause or origin of the outlier In cases of extreme observations which are not an infrequent occurrence the typical values must be analyzed In the case of quartiles the Interquartile Range IQR may be used to characterize the data when there may be extremities that skew the data the interquartile range is a relatively robust statistic also sometimes called resistance compared to the range and standard deviation There is also a mathematical method to check for outliers and determining fences upper and lower limits from which to check for outliers After determining the first and third quartiles and the interquartile range as outlined above then fences are calculated using the following formula Lower fence Q 1 1 5 I Q R displaystyle text Lower fence Q 1 1 5 mathrm IQR Upper fence Q 3 1 5 I Q R displaystyle text Upper fence Q 3 1 5 mathrm IQR Boxplot Diagram with Outlierswhere Q1 and Q3 are the first and third quartiles respectively The lower fence is the lower limit and the upper fence is the upper limit of data and any data lying outside these defined bounds can be considered an outlier Anything below the Lower fence or above the Upper fence can be considered such a case The fences provide a guideline by which to define an outlier which may be defined in other ways The fences define a range outside which an outlier exists a way to picture this is a boundary of a fence outside which are outsiders as opposed to outliers It is common for the lower and upper fences along with the outliers to be represented by a boxplot For a boxplot only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant Outliers located outside the fences in a boxplot can be marked as any choice of symbol such as an x or o The fences are sometimes also referred to as whiskers while the entire plot visual is called a box and whisker plot When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features it might be simple to mistakenly view it as evidence that the population is non normal or that the sample is contaminated However this method should not take place of a hypothesis test for determining normality of the population The significance of the outliers vary depending on the sample size If the sample is small then it is more probable to get interquartile ranges that are unrepresentatively small leading to narrower fences Therefore it would be more likely to find data that are marked as outliers 7 Computer software for quartiles EditEnvironment Function Quartile MethodMicrosoft Excel QUARTILE EXC Method 4Microsoft Excel QUARTILE INC Method 3TI 8X series calculators 1 Var Stats Method 1R fivenum Method 2Python numpy percentile Method 3Python pandas DataFrame describe Method 3Excel The Excel function QUARTILE array quart provides the desired quartile value for a given array of data using Method 3 from above In the Quartile function array is the dataset of numbers that is being analyzed and quart is any of the following 5 values depending on which quartile is being calculated 8 Quart Output QUARTILE Value0 Minimum value1 Lower Quartile 25th percentile 2 Median3 Upper Quartile 75th percentile 4 Maximum valueMATLAB In order to calculate quartiles in Matlab the function quantile A p can be used Where A is the vector of data being analyzed and p is the percentage that relates to the quartiles as stated below 9 p Output QUARTILE Value0 Minimum value0 25 Lower Quartile 25th percentile 0 5 Median0 75 Upper Quartile 75th percentile 1 Maximum valueSee also EditFive number summary Range Box plot Interquartile range Summary statistics QuantileReferences Edit a b c A modern introduction to probability and statistics understanding why and how Dekking Michel 1946 London Springer 2005 pp 234 238 ISBN 978 1 85233 896 1 OCLC 262680588 a href Template Cite book html title Template Cite book cite book a CS1 maint others link Knoch Jessica February 23 2018 How are Quartiles Used in Statistics Magoosh Statistics Blog Retrieved December 11 2019 Hyndman Rob J Fan Yanan November 1996 Sample quantiles in statistical packages American Statistician 50 4 361 365 doi 10 2307 2684934 JSTOR 2684934 Tukey John Wilder 1977 Exploratory Data Analysis ISBN 978 0 201 07616 5 6 Distribution and Quantile Functions PDF math bme hu Walfish Steven November 2006 A Review of Statistical Outlier Method Pharmaceutical Technology Dawson Robert July 1 2011 How Significant is a Boxplot Outlier Journal of Statistics Education 19 2 null doi 10 1080 10691898 2011 11889610 How to use the Excel QUARTILE function Exceljet exceljet net Retrieved December 11 2019 Quantiles of a data set MATLAB quantile www mathworks com Retrieved December 11 2019 External links EditQuartile from MathWorld Includes references and compares various methods to compute quartiles Quartiles From MathForum org Quartiles calculator simple quartiles calculator Quartiles An example how to calculate it Retrieved from https en wikipedia org w index php title Quartile amp oldid 1128289583, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.