fbpx
Wikipedia

Berkson's paradox

Berkson's paradox, also known as Berkson's bias, collider bias, or Berkson's fallacy, is a result in conditional probability and statistics which is often found to be counterintuitive, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design. The effect is related to the explaining away phenomenon in Bayesian networks, and conditioning on a collider in graphical models.

An example of Berkson's paradox:
In figure 1, assume that talent and attractiveness are uncorrelated in the population.
In figure 2, someone sampling the population using celebrities may wrongly infer that talent is negatively correlated with attractiveness, as people who are neither talented nor attractive do not typically become celebrities.

It is often described in the fields of medical statistics or biostatistics, as in the original description of the problem by Joseph Berkson.

Examples edit

Overview edit

 
An illustration of Berkson's Paradox. The top graph represents the actual distribution, in which a positive correlation between quality of burgers and fries is observed. However, an individual who does not eat at any location where both are bad observes only the distribution on the bottom graph, which appears to show a negative correlation.

The most common example of Berkson's paradox is a false observation of a negative correlation between two desirable traits, i.e., that members of a population which have some desirable trait tend to lack a second. Berkson's paradox occurs when this observation appears true when in reality the two properties are unrelated—or even positively correlated—because members of the population where both are absent are not equally observed. For example, a person may observe from their experience that fast food restaurants in their area which serve good hamburgers tend to serve bad fries and vice versa; but because they would likely not eat anywhere where both were bad, they fail to allow for the large number of restaurants in this category which would weaken or even flip the correlation.

Original illustration edit

Berkson's original illustration involves a retrospective study examining a risk factor for a disease in a statistical sample from a hospital in-patient population. Because samples are taken from a hospital in-patient population, rather than from the general public, this can result in a spurious negative association between the disease and the risk factor. For example, if the risk factor is diabetes and the disease is cholecystitis, a hospital patient without diabetes is more likely to have cholecystitis than a member of the general population, since the patient must have had some non-diabetes (possibly cholecystitis-causing) reason to enter the hospital in the first place. That result will be obtained regardless of whether there is any association between diabetes and cholecystitis in the general population.

Ellenberg example edit

An example presented by Jordan Ellenberg: Suppose Alex will only date a man if his niceness plus his handsomeness exceeds some threshold. Then nicer men do not have to be as handsome to qualify for Alex's dating pool. So, among the men that Alex dates, Alex may observe that the nicer ones are less handsome on average (and vice versa), even if these traits are uncorrelated in the general population. Note that this does not mean that men in the dating pool compare unfavorably with men in the population. On the contrary, Alex's selection criterion means that Alex has high standards. The average nice man that Alex dates is actually more handsome than the average man in the population (since even among nice men, the ugliest portion of the population is skipped). Berkson's negative correlation is an effect that arises within the dating pool: the rude men that Alex dates must have been even more handsome to qualify.

Quantitative example edit

As a quantitative example, suppose a collector has 1000 postage stamps, of which 300 are pretty and 100 are rare, with 30 being both pretty and rare. 30% of all his stamps are pretty and 10% of his pretty stamps are rare, so prettiness tells nothing about rarity. He puts the 370 stamps which are pretty or rare on display. Just over 27% of the stamps on display are rare (100/370), but still only 10%(30/300) of the pretty stamps are rare (and 100% of the 70 not-pretty stamps on display are rare). If an observer only considers stamps on display, they will observe a spurious negative relationship between prettiness and rarity as a result of the selection bias (that is, not-prettiness strongly indicates rarity in the display, but not in the total collection).

Statement edit

Two independent events become conditionally dependent given that at least one of them occurs. Symbolically:

If  ,  , and  , then   and hence  .
  • Event   and event   may or may not occur.
  •  , a conditional probability, is the probability of observing event   given that   is true.
  • Explanation: Event   and   are independent of each other.
  •   is the probability of observing event   given that   and (  or  ) occurs. This can also be written as  .
  • Explanation: The probability of   given both   and (  or  ) is smaller than the probability of   given (  or  )

In other words, given two independent events, if you consider only outcomes where at least one occurs, then they become conditionally dependent, as shown above.

There's a simpler, more general argument:

Given two events   and   with  , we have  . Multiplying both sides of the right-hand inequality by  , we get  . Dividing both sides of this by   yields  

When   (i.e., when   is a set of less than full probability), the inequality is strict:  , and hence,   and   are dependent.

Note only two assumptions were used in the argument above: (i)   which is sufficient to imply  . And (ii)  , which with (i) implies the strict inequality  , and so dependence of   and  . It's not necessary to assume   and   are independent—it's true for any events   and   satisfying (i) and (ii) (including independent events).

Explanation edit

The cause is that the conditional probability of event   occurring, given that it or   occurs, is inflated: it is higher than the unconditional probability, because we have excluded cases where neither occur.

 
conditional probability inflated relative to unconditional

One can see this in tabular form as follows: the yellow regions are the outcomes where at least one event occurs (and ~A means "not A").

A ~A
B A & B ~A & B
~B A & ~B ~A & ~B

For instance, if one has a sample of  , and both   and   occur independently half the time (   ), one obtains:

A ~A
B 25 25
~B 25 25

So in   outcomes, either   or   occurs, of which   have   occurring. By comparing the conditional probability of   to the unconditional probability of  :

 

We see that the probability of   is higher ( ) in the subset of outcomes where (  or  ) occurs, than in the overall population ( ). On the other hand, the probability of   given both   and (  or  ) is simply the unconditional probability of  ,  , since   is independent of  . In the numerical example, we have conditioned on being in the top row:

A ~A
B 25 25
~B 25 25

Here the probability of   is  .

Berkson's paradox arises because the conditional probability of   given   within the three-cell subset equals the conditional probability in the overall population, but the unconditional probability within the subset is inflated relative to the unconditional probability in the overall population, hence, within the subset, the presence of   decreases the conditional probability of   (back to its overall unconditional probability):

 
 


Because the effect of conditioning on   derives from the relative size of   and   the effect is particularly large when   is rare ( ) but very strongly correlated to   ( ). For example, consider the case below where N is very large:

A ~A
B 1 0
~B 0 N

For the case without conditioning on   we have

 
 

So A occurs rarely, unless B is present, when A occurs always. Thus B is dramatically increasing the likelihood of A.

For the case with conditioning on   we have

 
 

Now A occurs always, whether B is present or not. So B has no impact on the likelihood of A. Thus we see that for highly correlated data a huge positive correlation of B on A can be effectively removed when one conditions on  .

See also edit

References edit

  • Berkson, Joseph (June 1946). "Limitations of the Application of Fourfold Table Analysis to Hospital Data". Biometrics Bulletin. 2 (3): 47–53. doi:10.2307/3002000. JSTOR 3002000. PMID 21001024. (The paper is frequently miscited as Berkson, J. (1949) Biological Bulletin 2, 47–53.)
  • Jordan Ellenberg, "Why are handsome men such jerks?"

berkson, paradox, this, article, includes, list, references, related, reading, external, links, sources, remain, unclear, because, lacks, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, march, 2023, learn, when, . This article includes a list of references related reading or external links but its sources remain unclear because it lacks inline citations Please help improve this article by introducing more precise citations March 2023 Learn how and when to remove this template message Berkson s paradox also known as Berkson s bias collider bias or Berkson s fallacy is a result in conditional probability and statistics which is often found to be counterintuitive and hence a veridical paradox It is a complicating factor arising in statistical tests of proportions Specifically it arises when there is an ascertainment bias inherent in a study design The effect is related to the explaining away phenomenon in Bayesian networks and conditioning on a collider in graphical models An example of Berkson s paradox In figure 1 assume that talent and attractiveness are uncorrelated in the population In figure 2 someone sampling the population using celebrities may wrongly infer that talent is negatively correlated with attractiveness as people who are neither talented nor attractive do not typically become celebrities It is often described in the fields of medical statistics or biostatistics as in the original description of the problem by Joseph Berkson Contents 1 Examples 1 1 Overview 1 2 Original illustration 1 3 Ellenberg example 1 4 Quantitative example 2 Statement 2 1 Explanation 3 See also 4 ReferencesExamples editOverview edit nbsp An illustration of Berkson s Paradox The top graph represents the actual distribution in which a positive correlation between quality of burgers and fries is observed However an individual who does not eat at any location where both are bad observes only the distribution on the bottom graph which appears to show a negative correlation The most common example of Berkson s paradox is a false observation of a negative correlation between two desirable traits i e that members of a population which have some desirable trait tend to lack a second Berkson s paradox occurs when this observation appears true when in reality the two properties are unrelated or even positively correlated because members of the population where both are absent are not equally observed For example a person may observe from their experience that fast food restaurants in their area which serve good hamburgers tend to serve bad fries and vice versa but because they would likely not eat anywhere where both were bad they fail to allow for the large number of restaurants in this category which would weaken or even flip the correlation Original illustration edit Berkson s original illustration involves a retrospective study examining a risk factor for a disease in a statistical sample from a hospital in patient population Because samples are taken from a hospital in patient population rather than from the general public this can result in a spurious negative association between the disease and the risk factor For example if the risk factor is diabetes and the disease is cholecystitis a hospital patient without diabetes is more likely to have cholecystitis than a member of the general population since the patient must have had some non diabetes possibly cholecystitis causing reason to enter the hospital in the first place That result will be obtained regardless of whether there is any association between diabetes and cholecystitis in the general population Ellenberg example edit An example presented by Jordan Ellenberg Suppose Alex will only date a man if his niceness plus his handsomeness exceeds some threshold Then nicer men do not have to be as handsome to qualify for Alex s dating pool So among the men that Alex dates Alex may observe that the nicer ones are less handsome on average and vice versa even if these traits are uncorrelated in the general population Note that this does not mean that men in the dating pool compare unfavorably with men in the population On the contrary Alex s selection criterion means that Alex has high standards The average nice man that Alex dates is actually more handsome than the average man in the population since even among nice men the ugliest portion of the population is skipped Berkson s negative correlation is an effect that arises within the dating pool the rude men that Alex dates must have been even more handsome to qualify Quantitative example editAs a quantitative example suppose a collector has 1000 postage stamps of which 300 are pretty and 100 are rare with 30 being both pretty and rare 30 of all his stamps are pretty and 10 of his pretty stamps are rare so prettiness tells nothing about rarity He puts the 370 stamps which are pretty or rare on display Just over 27 of the stamps on display are rare 100 370 but still only 10 30 300 of the pretty stamps are rare and 100 of the 70 not pretty stamps on display are rare If an observer only considers stamps on display they will observe a spurious negative relationship between prettiness and rarity as a result of the selection bias that is not prettiness strongly indicates rarity in the display but not in the total collection Statement editTwo independent events become conditionally dependent given that at least one of them occurs Symbolically If 0 lt P A lt 1 displaystyle 0 lt P A lt 1 nbsp 0 lt P B lt 1 displaystyle 0 lt P B lt 1 nbsp and P A B P A displaystyle P A B P A nbsp then P A B A B P A displaystyle P A B A cup B P A nbsp and hence P A A B gt P A displaystyle P A A cup B gt P A nbsp Event A displaystyle A nbsp and event B displaystyle B nbsp may or may not occur P A B displaystyle P A mid B nbsp a conditional probability is the probability of observing event A displaystyle A nbsp given that B displaystyle B nbsp is true Explanation Event A displaystyle A nbsp and B displaystyle B nbsp are independent of each other P A B A B displaystyle P A mid B A cup B nbsp is the probability of observing event A displaystyle A nbsp given that B displaystyle B nbsp and A displaystyle A nbsp or B displaystyle B nbsp occurs This can also be written as P A B A B displaystyle P A mid B cap A cup B nbsp Explanation The probability of A displaystyle A nbsp given both B displaystyle B nbsp and A displaystyle A nbsp or B displaystyle B nbsp is smaller than the probability of A displaystyle A nbsp given A displaystyle A nbsp or B displaystyle B nbsp In other words given two independent events if you consider only outcomes where at least one occurs then they become conditionally dependent as shown above There s a simpler more general argument Given two events A displaystyle A nbsp and B displaystyle B nbsp with 0 lt P A 1 displaystyle 0 lt P A leq 1 nbsp we have 0 lt P A P A B 1 displaystyle 0 lt P A leq P A cup B leq 1 nbsp Multiplying both sides of the right hand inequality by P A displaystyle P A nbsp we get P A P A B P A displaystyle P A P A cup B leq P A nbsp Dividing both sides of this by P A B displaystyle P A cup B nbsp yields P A P A P A B P A A B P A B P A A B i e P A P A A B displaystyle begin aligned P A leq frac P A P A cup B amp frac P A cap A cup B P A cup B amp P A mid A cup B i e P A leq P A mid A cup B end aligned nbsp When P A B lt 1 displaystyle P A cup B lt 1 nbsp i e when A B displaystyle A cup B nbsp is a set of less than full probability the inequality is strict P A lt P A A B displaystyle P A lt P A mid A cup B nbsp and hence A displaystyle A nbsp and A B displaystyle A cup B nbsp are dependent Note only two assumptions were used in the argument above i 0 lt P A 1 displaystyle 0 lt P A leq 1 nbsp which is sufficient to imply P A P A A B displaystyle P A leq P A mid A cup B nbsp And ii P A B lt 1 displaystyle P A cup B lt 1 nbsp which with i implies the strict inequality P A lt P A A B displaystyle P A lt P A mid A cup B nbsp and so dependence of A displaystyle A nbsp and A B displaystyle A cup B nbsp It s not necessary to assume A displaystyle A nbsp and B displaystyle B nbsp are independent it s true for any events A displaystyle A nbsp and B displaystyle B nbsp satisfying i and ii including independent events Explanation edit The cause is that the conditional probability of event A displaystyle A nbsp occurring given that it or B displaystyle B nbsp occurs is inflated it is higher than the unconditional probability because we have excluded cases where neither occur P A A B gt P A displaystyle P A mid A cup B gt P A nbsp conditional probability inflated relative to unconditionalOne can see this in tabular form as follows the yellow regions are the outcomes where at least one event occurs and A means not A A AB A amp B A amp B B A amp B A amp BFor instance if one has a sample of 100 displaystyle 100 nbsp and both A displaystyle A nbsp and B displaystyle B nbsp occur independently half the time P A P B 1 2 displaystyle P A P B 1 2 nbsp one obtains A AB 25 25 B 25 25So in 75 displaystyle 75 nbsp outcomes either A displaystyle A nbsp or B displaystyle B nbsp occurs of which 50 displaystyle 50 nbsp have A displaystyle A nbsp occurring By comparing the conditional probability of A displaystyle A nbsp to the unconditional probability of A displaystyle A nbsp P A A B 50 75 2 3 gt P A 50 100 1 2 displaystyle P A A cup B 50 75 2 3 gt P A 50 100 1 2 nbsp We see that the probability of A displaystyle A nbsp is higher 2 3 displaystyle 2 3 nbsp in the subset of outcomes where A displaystyle A nbsp or B displaystyle B nbsp occurs than in the overall population 1 2 displaystyle 1 2 nbsp On the other hand the probability of A displaystyle A nbsp given both B displaystyle B nbsp and A displaystyle A nbsp or B displaystyle B nbsp is simply the unconditional probability of A displaystyle A nbsp P A displaystyle P A nbsp since A displaystyle A nbsp is independent of B displaystyle B nbsp In the numerical example we have conditioned on being in the top row A AB 25 25 B 25 25Here the probability of A displaystyle A nbsp is 25 50 1 2 displaystyle 25 50 1 2 nbsp Berkson s paradox arises because the conditional probability of A displaystyle A nbsp given B displaystyle B nbsp within the three cell subset equals the conditional probability in the overall population but the unconditional probability within the subset is inflated relative to the unconditional probability in the overall population hence within the subset the presence of B displaystyle B nbsp decreases the conditional probability of A displaystyle A nbsp back to its overall unconditional probability P A B A B P A B P A displaystyle P A B A cup B P A B P A nbsp P A A B gt P A displaystyle P A A cup B gt P A nbsp Because the effect of conditioning on A B displaystyle A cup B nbsp derives from the relative size of P A A B displaystyle P A A cup B nbsp and P A displaystyle P A nbsp the effect is particularly large when A displaystyle A nbsp is rare P A lt lt 1 displaystyle P A lt lt 1 nbsp but very strongly correlated to B displaystyle B nbsp P A B 1 displaystyle P A B approx 1 nbsp For example consider the case below where N is very large A AB 1 0 B 0 NFor the case without conditioning on A B displaystyle A cup B nbsp we have P A 1 N 1 displaystyle P A 1 N 1 nbsp P A B 1 displaystyle P A B 1 nbsp So A occurs rarely unless B is present when A occurs always Thus B is dramatically increasing the likelihood of A For the case with conditioning on A B displaystyle A cup B nbsp we have P A A B 1 displaystyle P A A cup B 1 nbsp P A B A B P A B 1 displaystyle P A B A cup B P A B 1 nbsp Now A occurs always whether B is present or not So B has no impact on the likelihood of A Thus we see that for highly correlated data a huge positive correlation of B on A can be effectively removed when one conditions on A B displaystyle A cup B nbsp See also editSimpson s paradox Survivorship biasReferences editBerkson Joseph June 1946 Limitations of the Application of Fourfold Table Analysis to Hospital Data Biometrics Bulletin 2 3 47 53 doi 10 2307 3002000 JSTOR 3002000 PMID 21001024 The paper is frequently miscited as Berkson J 1949 Biological Bulletin 2 47 53 Jordan Ellenberg Why are handsome men such jerks Retrieved from https en wikipedia org w index php title Berkson 27s paradox amp oldid 1193864489, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.