fbpx
Wikipedia

Dummy variable (statistics)

In regression analysis, a dummy variable (also known as indicator variable or just dummy) is one that takes a binary value (0 or 1) to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.[1] For example, if we were studying the relationship between biological sex and income, we could use a dummy variable to represent the sex of each individual in the study. The variable could take on a value of 1 for males and 0 for females (or vice versa). In machine learning this is known as one-hot encoding.

Dummy variables are commonly used in regression analysis to represent categorical variables that have more than two levels, such as education level or occupation. In this case, multiple dummy variables would be created to represent each level of the variable, and only one dummy variable would take on a value of 1 for each observation. Dummy variables are useful because they allow us to include categorical variables in our analysis, which would otherwise be difficult to include due to their non-numeric nature. They can also help us to control for confounding factors and improve the validity of our results.

As with any addition of variables to a model, the addition of dummy variables will increases the within-sample model fit (coefficient of determination), but at a cost of fewer degrees of freedom and loss of generality of the model (out of sample model fit). Too many dummy variables result in a model that does not provide any general conclusions.

Dummy variables are useful in various cases. For example, in econometric time series analysis, dummy variables may be used to indicate the occurrence of wars, or major strikes. It could thus be thought of as a Boolean, i.e., a truth value represented as the numerical value 0 or 1 (as is sometimes done in computer programming).

Dummy variables may be extended to more complex cases. For example, seasonal effects may be captured by creating dummy variables for each of the seasons: D1=1 if the observation is for summer, and equals zero otherwise; D2=1 if and only if autumn, otherwise equals zero; D3=1 if and only if winter, otherwise equals zero; and D4=1 if and only if spring, otherwise equals zero. In the panel data fixed effects estimator dummies are created for each of the units in cross-sectional data (e.g. firms or countries) or periods in a pooled time-series. However in such regressions either the constant term has to be removed, or one of the dummies removed making this the base category against which the others are assessed, for the following reason:

If dummy variables for all categories were included, their sum would equal 1 for all observations, which is identical to and hence perfectly correlated with the vector-of-ones variable whose coefficient is the constant term; if the vector-of-ones variable were also present, this would result in perfect multicollinearity,[2] so that the matrix inversion in the estimation algorithm would be impossible. This is referred to as the dummy variable trap.

See also Edit

References Edit

  1. ^ Draper, N.R.; Smith, H. (1998) Applied Regression Analysis, Wiley. ISBN 0-471-17082-8 (Chapter 14)
  2. ^ Suits, Daniel B. (1957). "Use of Dummy Variables in Regression Equations". Journal of the American Statistical Association. 52 (280): 548–551. JSTOR 2281705.

Further reading Edit

  • Asteriou, Dimitrios; Hall, S. G. (2015). "Dummy Variables". Applied Econometrics (3rd ed.). London: Palgrave Macmillan. pp. 209–230. ISBN 978-1-137-41546-2.
  • Kooyman, Marius A. (1976). Dummy Variables in Econometrics. Tilburg: Tilburg University Press. ISBN 90-237-2919-6.

External links Edit

  • Maathuis, Marloes (2007). (PDF). Stat 423: Applied Regression and Analysis of Variance. Archived from the original (PDF) on December 16, 2011.
  • Fox, John (2010). "Dummy-Variable Regression" (PDF).
  • Baker, Samuel L. (2006). (PDF). Archived from the original (PDF) on March 1, 2006.

dummy, variable, statistics, this, article, about, usage, statistics, usage, computing, math, bound, variable, regression, analysis, dummy, variable, also, known, indicator, variable, just, dummy, that, takes, binary, value, indicate, absence, presence, some, . This article is about the usage in statistics For the usage in computing and math see Bound variable In regression analysis a dummy variable also known as indicator variable or just dummy is one that takes a binary value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome 1 For example if we were studying the relationship between biological sex and income we could use a dummy variable to represent the sex of each individual in the study The variable could take on a value of 1 for males and 0 for females or vice versa In machine learning this is known as one hot encoding Dummy variables are commonly used in regression analysis to represent categorical variables that have more than two levels such as education level or occupation In this case multiple dummy variables would be created to represent each level of the variable and only one dummy variable would take on a value of 1 for each observation Dummy variables are useful because they allow us to include categorical variables in our analysis which would otherwise be difficult to include due to their non numeric nature They can also help us to control for confounding factors and improve the validity of our results As with any addition of variables to a model the addition of dummy variables will increases the within sample model fit coefficient of determination but at a cost of fewer degrees of freedom and loss of generality of the model out of sample model fit Too many dummy variables result in a model that does not provide any general conclusions Dummy variables are useful in various cases For example in econometric time series analysis dummy variables may be used to indicate the occurrence of wars or major strikes It could thus be thought of as a Boolean i e a truth value represented as the numerical value 0 or 1 as is sometimes done in computer programming Dummy variables may be extended to more complex cases For example seasonal effects may be captured by creating dummy variables for each of the seasons D1 1 if the observation is for summer and equals zero otherwise D2 1 if and only if autumn otherwise equals zero D3 1 if and only if winter otherwise equals zero and D4 1 if and only if spring otherwise equals zero In the panel data fixed effects estimator dummies are created for each of the units in cross sectional data e g firms or countries or periods in a pooled time series However in such regressions either the constant term has to be removed or one of the dummies removed making this the base category against which the others are assessed for the following reason If dummy variables for all categories were included their sum would equal 1 for all observations which is identical to and hence perfectly correlated with the vector of ones variable whose coefficient is the constant term if the vector of ones variable were also present this would result in perfect multicollinearity 2 so that the matrix inversion in the estimation algorithm would be impossible This is referred to as the dummy variable trap Contents 1 See also 2 References 3 Further reading 4 External linksSee also EditBinary regression Chow test Hypothesis testing Indicator function Linear discriminant function Multicollinearity One hotReferences Edit Draper N R Smith H 1998 Applied Regression Analysis Wiley ISBN 0 471 17082 8 Chapter 14 Suits Daniel B 1957 Use of Dummy Variables in Regression Equations Journal of the American Statistical Association 52 280 548 551 JSTOR 2281705 Further reading EditAsteriou Dimitrios Hall S G 2015 Dummy Variables Applied Econometrics 3rd ed London Palgrave Macmillan pp 209 230 ISBN 978 1 137 41546 2 Kooyman Marius A 1976 Dummy Variables in Econometrics Tilburg Tilburg University Press ISBN 90 237 2919 6 External links Edit nbsp Wikiversity has learning resources about Dummy variable statistics Maathuis Marloes 2007 Chapter 7 Dummy variable regression PDF Stat 423 Applied Regression and Analysis of Variance Archived from the original PDF on December 16 2011 Fox John 2010 Dummy Variable Regression PDF Baker Samuel L 2006 Dummy Variables PDF Archived from the original PDF on March 1 2006 Retrieved from https en wikipedia org w index php title Dummy variable statistics amp oldid 1172598976, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.