fbpx
Wikipedia

Binomial regression

In statistics, binomial regression is a regression analysis technique in which the response (often referred to as Y) has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success .[1] In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

Binomial regression is closely related to binary regression: a binary regression can be considered a binomial regression with , or a regression on ungrouped binary data, while a binomial regression can be considered a regression on grouped binary data (see comparison).[2] Binomial regression models are essentially the same as binary choice models, one type of discrete choice model: the primary difference is in the theoretical motivation (see comparison). In machine learning, binomial regression is considered a special case of probabilistic classification, and thus a generalization of binary classification.

Example application edit

In one published example of an application of binomial regression,[3] the details were as follows. The observed outcome variable was whether or not a fault occurred in an industrial process. There were two explanatory variables: the first was a simple two-case factor representing whether or not a modified version of the process was used and the second was an ordinary quantitative variable measuring the purity of the material being supplied for the process.

Specification of model edit

The response variable Y is assumed to be binomially distributed conditional on the explanatory variables X. The number of trials n is known, and the probability of success for each trial p is specified as a function θ(X). This implies that the conditional expectation and conditional variance of the observed fraction of successes, Y/n, are

 
 

The goal of binomial regression is to estimate the function θ(X). Typically the statistician assumes  , for a known function m, and estimates β. Common choices for m include the logistic function.[1]

The data are often fitted as a generalised linear model where the predicted values μ are the probabilities that any individual event will result in a success. The likelihood of the predictions is then given by

 

where 1A is the indicator function which takes on the value one when the event A occurs, and zero otherwise: in this formulation, for any given observation yi, only one of the two terms inside the product contributes, according to whether yi=0 or 1. The likelihood function is more fully specified by defining the formal parameters μi as parameterised functions of the explanatory variables: this defines the likelihood in terms of a much reduced number of parameters. Fitting of the model is usually achieved by employing the method of maximum likelihood to determine these parameters. In practice, the use of a formulation as a generalised linear model allows advantage to be taken of certain algorithmic ideas which are applicable across the whole class of more general models but which do not apply to all maximum likelihood problems.

Models used in binomial regression can often be extended to multinomial data.

There are many methods of generating the values of μ in systematic ways that allow for interpretation of the model; they are discussed below.

Link functions edit

There is a requirement that the modelling linking the probabilities μ to the explanatory variables should be of a form which only produces values in the range 0 to 1. Many models can be fitted into the form

 

Here η is an intermediate variable representing a linear combination, containing the regression parameters, of the explanatory variables. The function g is the cumulative distribution function (cdf) of some probability distribution. Usually this probability distribution has a support from minus infinity to plus infinity so that any finite value of η is transformed by the function g to a value inside the range 0 to 1.

In the case of logistic regression, the link function is the log of the odds ratio or logistic function. In the case of probit, the link is the cdf of the normal distribution. The linear probability model is not a proper binomial regression specification because predictions need not be in the range of zero to one; it is sometimes used for this type of data when the probability space is where interpretation occurs or when the analyst lacks sufficient sophistication to fit or calculate approximate linearizations of probabilities for interpretation.

Comparison with binary regression edit

Binomial regression is closely connected with binary regression. If the response is a binary variable (two possible outcomes), then these alternatives can be coded as 0 or 1 by considering one of the outcomes as "success" and the other as "failure" and considering these as count data: "success" is 1 success out of 1 trial, while "failure" is 0 successes out of 1 trial. This can now be considered a binomial distribution with   trial, so a binary regression is a special case of a binomial regression. If these data are grouped (by adding counts), they are no longer binary data, but are count data for each group, and can still be modeled by a binomial regression; the individual binary outcomes are then referred to as "ungrouped data". An advantage of working with grouped data is that one can test the goodness of fit of the model;[2] for example, grouped data may exhibit overdispersion relative to the variance estimated from the ungrouped data.

Comparison with binary choice models edit

A binary choice model assumes a latent variable Un, the utility (or net benefit) that person n obtains from taking an action (as opposed to not taking the action). The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not:

 

where   is a set of regression coefficients and   is a set of independent variables (also known as "features") describing person n, which may be either discrete "dummy variables" or regular continuous variables.   is a random variable specifying "noise" or "error" in the prediction, assumed to be distributed according to some distribution. Normally, if there is a mean or variance parameter in the distribution, it cannot be identified, so the parameters are set to convenient values — by convention usually mean 0, variance 1.

The person takes the action, yn = 1, if Un > 0. The unobserved term, εn, is assumed to have a logistic distribution.

The specification is written succinctly as:

    • Un = βsn + εn
    •  
    • ε logistic, standard normal, etc.

Let us write it slightly differently:

    • Un = βsnen
    •  
    • e logistic, standard normal, etc.

Here we have made the substitution en = −εn. This changes a random variable into a slightly different one, defined over a negated domain. As it happens, the error distributions we usually consider (e.g. logistic distribution, standard normal distribution, standard Student's t-distribution, etc.) are symmetric about 0, and hence the distribution over en is identical to the distribution over εn.

Denote the cumulative distribution function (CDF) of   as   and the quantile function (inverse CDF) of   as  

Note that

 

Since   is a Bernoulli trial, where   we have

 

or equivalently

 

Note that this is exactly equivalent to the binomial regression model expressed in the formalism of the generalized linear model.

If   i.e. distributed as a standard normal distribution, then

 

which is exactly a probit model.

If   i.e. distributed as a standard logistic distribution with mean 0 and scale parameter 1, then the corresponding quantile function is the logit function, and

 

which is exactly a logit model.

Note that the two different formalisms — generalized linear models (GLM's) and discrete choice models — are equivalent in the case of simple binary choice models, but can be extended if differing ways:

Latent variable interpretation / derivation edit

A latent variable model involving a binomial observed variable Y can be constructed such that Y is related to the latent variable Y* via

 

The latent variable Y* is then related to a set of regression variables X by the model

 

This results in a binomial regression model.

The variance of ϵ can not be identified and when it is not of interest is often assumed to be equal to one. If ϵ is normally distributed, then a probit is the appropriate model and if ϵ is log-Weibull distributed, then a logit is appropriate. If ϵ is uniformly distributed, then a linear probability model is appropriate.

See also edit

Notes edit

  1. ^ a b Sanford Weisberg (2005). "Binomial Regression". Applied Linear Regression. Wiley-IEEE. pp. 253–254. ISBN 0-471-66379-4.
  2. ^ a b Rodríguez 2007, Chapter 3, p. 5.
  3. ^ Cox & Snell (1981), Example H, p. 91

References edit

Further reading edit

  • Dean, C. B. (1992). "Testing for Overdispersion in Poisson and Binomial Regression Models". Journal of the American Statistical Association. Informa UK Limited. 87 (418): 451–457. doi:10.1080/01621459.1992.10475225. ISSN 0162-1459. JSTOR 2290276.


binomial, regression, statistics, binomial, regression, regression, analysis, technique, which, response, often, referred, binomial, distribution, number, successes, series, displaystyle, independent, bernoulli, trials, where, each, trial, probability, success. In statistics binomial regression is a regression analysis technique in which the response often referred to as Y has a binomial distribution it is the number of successes in a series of n displaystyle n independent Bernoulli trials where each trial has probability of success p displaystyle p 1 In binomial regression the probability of a success is related to explanatory variables the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables Binomial regression is closely related to binary regression a binary regression can be considered a binomial regression with n 1 displaystyle n 1 or a regression on ungrouped binary data while a binomial regression can be considered a regression on grouped binary data see comparison 2 Binomial regression models are essentially the same as binary choice models one type of discrete choice model the primary difference is in the theoretical motivation see comparison In machine learning binomial regression is considered a special case of probabilistic classification and thus a generalization of binary classification Contents 1 Example application 2 Specification of model 3 Link functions 4 Comparison with binary regression 5 Comparison with binary choice models 6 Latent variable interpretation derivation 7 See also 8 Notes 9 References 10 Further readingExample application editIn one published example of an application of binomial regression 3 the details were as follows The observed outcome variable was whether or not a fault occurred in an industrial process There were two explanatory variables the first was a simple two case factor representing whether or not a modified version of the process was used and the second was an ordinary quantitative variable measuring the purity of the material being supplied for the process Specification of model editThe response variable Y is assumed to be binomially distributed conditional on the explanatory variables X The number of trials n is known and the probability of success for each trial p is specified as a function 8 X This implies that the conditional expectation and conditional variance of the observed fraction of successes Y n are E Y n X 8 X displaystyle E Y n mid X theta X nbsp Var Y n X 8 X 1 8 X n displaystyle operatorname Var Y n mid X theta X 1 theta X n nbsp The goal of binomial regression is to estimate the function 8 X Typically the statistician assumes 8 X m b T X displaystyle theta X m beta mathrm T X nbsp for a known function m and estimates b Common choices for m include the logistic function 1 The data are often fitted as a generalised linear model where the predicted values m are the probabilities that any individual event will result in a success The likelihood of the predictions is then given by L m Y i 1 n 1 y i 1 m i 1 y i 0 1 m i displaystyle L boldsymbol mu mid Y prod i 1 n left 1 y i 1 mu i 1 y i 0 1 mu i right nbsp where 1A is the indicator function which takes on the value one when the event A occurs and zero otherwise in this formulation for any given observation yi only one of the two terms inside the product contributes according to whether yi 0 or 1 The likelihood function is more fully specified by defining the formal parameters mi as parameterised functions of the explanatory variables this defines the likelihood in terms of a much reduced number of parameters Fitting of the model is usually achieved by employing the method of maximum likelihood to determine these parameters In practice the use of a formulation as a generalised linear model allows advantage to be taken of certain algorithmic ideas which are applicable across the whole class of more general models but which do not apply to all maximum likelihood problems Models used in binomial regression can often be extended to multinomial data There are many methods of generating the values of m in systematic ways that allow for interpretation of the model they are discussed below Link functions editThere is a requirement that the modelling linking the probabilities m to the explanatory variables should be of a form which only produces values in the range 0 to 1 Many models can be fitted into the form m g h displaystyle boldsymbol mu g boldsymbol eta nbsp Here h is an intermediate variable representing a linear combination containing the regression parameters of the explanatory variables The function g is the cumulative distribution function cdf of some probability distribution Usually this probability distribution has a support from minus infinity to plus infinity so that any finite value of h is transformed by the function g to a value inside the range 0 to 1 In the case of logistic regression the link function is the log of the odds ratio or logistic function In the case of probit the link is the cdf of the normal distribution The linear probability model is not a proper binomial regression specification because predictions need not be in the range of zero to one it is sometimes used for this type of data when the probability space is where interpretation occurs or when the analyst lacks sufficient sophistication to fit or calculate approximate linearizations of probabilities for interpretation Comparison with binary regression editBinomial regression is closely connected with binary regression If the response is a binary variable two possible outcomes then these alternatives can be coded as 0 or 1 by considering one of the outcomes as success and the other as failure and considering these as count data success is 1 success out of 1 trial while failure is 0 successes out of 1 trial This can now be considered a binomial distribution with n 1 displaystyle n 1 nbsp trial so a binary regression is a special case of a binomial regression If these data are grouped by adding counts they are no longer binary data but are count data for each group and can still be modeled by a binomial regression the individual binary outcomes are then referred to as ungrouped data An advantage of working with grouped data is that one can test the goodness of fit of the model 2 for example grouped data may exhibit overdispersion relative to the variance estimated from the ungrouped data Comparison with binary choice models editA binary choice model assumes a latent variable Un the utility or net benefit that person n obtains from taking an action as opposed to not taking the action The utility the person obtains from taking the action depends on the characteristics of the person some of which are observed by the researcher and some are not U n b s n e n displaystyle U n boldsymbol beta cdot mathbf s n varepsilon n nbsp where b displaystyle boldsymbol beta nbsp is a set of regression coefficients and s n displaystyle mathbf s n nbsp is a set of independent variables also known as features describing person n which may be either discrete dummy variables or regular continuous variables e n displaystyle varepsilon n nbsp is a random variable specifying noise or error in the prediction assumed to be distributed according to some distribution Normally if there is a mean or variance parameter in the distribution it cannot be identified so the parameters are set to convenient values by convention usually mean 0 variance 1 The person takes the action yn 1 if Un gt 0 The unobserved term en is assumed to have a logistic distribution The specification is written succinctly as Un bsn en Y n 1 if U n gt 0 0 if U n 0 displaystyle Y n begin cases 1 amp text if U n gt 0 0 amp text if U n leq 0 end cases nbsp e logistic standard normal etc Let us write it slightly differently Un bsn en Y n 1 if U n gt 0 0 if U n 0 displaystyle Y n begin cases 1 amp text if U n gt 0 0 amp text if U n leq 0 end cases nbsp e logistic standard normal etc Here we have made the substitution en en This changes a random variable into a slightly different one defined over a negated domain As it happens the error distributions we usually consider e g logistic distribution standard normal distribution standard Student s t distribution etc are symmetric about 0 and hence the distribution over en is identical to the distribution over en Denote the cumulative distribution function CDF of e displaystyle e nbsp as F e displaystyle F e nbsp and the quantile function inverse CDF of e displaystyle e nbsp as F e 1 displaystyle F e 1 nbsp Note that Pr Y n 1 Pr U n gt 0 Pr b s n e n gt 0 Pr e n gt b s n Pr e n b s n F e b s n displaystyle begin aligned Pr Y n 1 amp Pr U n gt 0 6pt amp Pr boldsymbol beta cdot mathbf s n e n gt 0 6pt amp Pr e n gt boldsymbol beta cdot mathbf s n 6pt amp Pr e n leq boldsymbol beta cdot mathbf s n 6pt amp F e boldsymbol beta cdot mathbf s n end aligned nbsp dd Since Y n displaystyle Y n nbsp is a Bernoulli trial where E Y n Pr Y n 1 displaystyle mathbb E Y n Pr Y n 1 nbsp we have E Y n F e b s n displaystyle mathbb E Y n F e boldsymbol beta cdot mathbf s n nbsp or equivalently F e 1 E Y n b s n displaystyle F e 1 mathbb E Y n boldsymbol beta cdot mathbf s n nbsp Note that this is exactly equivalent to the binomial regression model expressed in the formalism of the generalized linear model If e n N 0 1 displaystyle e n sim mathcal N 0 1 nbsp i e distributed as a standard normal distribution then F 1 E Y n b s n displaystyle Phi 1 mathbb E Y n boldsymbol beta cdot mathbf s n nbsp which is exactly a probit model If e n Logistic 0 1 displaystyle e n sim operatorname Logistic 0 1 nbsp i e distributed as a standard logistic distribution with mean 0 and scale parameter 1 then the corresponding quantile function is the logit function and logit E Y n b s n displaystyle operatorname logit mathbb E Y n boldsymbol beta cdot mathbf s n nbsp which is exactly a logit model Note that the two different formalisms generalized linear models GLM s and discrete choice models are equivalent in the case of simple binary choice models but can be extended if differing ways GLM s can easily handle arbitrarily distributed response variables dependent variables not just categorical variables or ordinal variables which discrete choice models are limited to by their nature GLM s are also not limited to link functions that are quantile functions of some distribution unlike the use of an error variable which must by assumption have a probability distribution On the other hand because discrete choice models are described as types of generative models it is conceptually easier to extend them to complicated situations with multiple possibly correlated choices for each person or other variations Latent variable interpretation derivation editA latent variable model involving a binomial observed variable Y can be constructed such that Y is related to the latent variable Y via Y 0 if Y gt 0 1 if Y lt 0 displaystyle Y begin cases 0 amp mbox if Y gt 0 1 amp mbox if Y lt 0 end cases nbsp The latent variable Y is then related to a set of regression variables X by the model Y X b ϵ displaystyle Y X beta epsilon nbsp This results in a binomial regression model The variance of ϵ can not be identified and when it is not of interest is often assumed to be equal to one If ϵ is normally distributed then a probit is the appropriate model and if ϵ is log Weibull distributed then a logit is appropriate If ϵ is uniformly distributed then a linear probability model is appropriate See also editLinear probability model Poisson regression Predictive modellingNotes edit a b Sanford Weisberg 2005 Binomial Regression Applied Linear Regression Wiley IEEE pp 253 254 ISBN 0 471 66379 4 a b Rodriguez 2007 Chapter 3 p 5 Cox amp Snell 1981 Example H p 91References editCox D R Snell E J 1981 Applied Statistics Principles and Examples Chapman and Hall ISBN 0 412 16570 8 Rodriguez German 2007 Lecture Notes on Generalized Linear Models Further reading editDean C B 1992 Testing for Overdispersion in Poisson and Binomial Regression Models Journal of the American Statistical Association Informa UK Limited 87 418 451 457 doi 10 1080 01621459 1992 10475225 ISSN 0162 1459 JSTOR 2290276 Retrieved from https en wikipedia org w index php title Binomial regression amp oldid 1199287554, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.