fbpx
Wikipedia

Chi-square automatic interaction detection

Chi-square automatic interaction detection (CHAID)[1][2][3] is a decision tree technique based on adjusted significance testing (Bonferroni correction, Holm-Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of AID (Automatic Interaction Detection)[4] and THAID (THeta Automatic Interaction Detection)[5][6] procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed by Belson in the UK in the 1950s.[7] A history of earlier supervised tree methods together with a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen,[2] can be found in Ritschard.[3]

In practice, CHAID is often used in the context of direct marketing to select groups of consumers to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.

Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.

One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric.

See also edit

References edit

  1. ^ Kass, G. V. (1980). "An Exploratory Technique for Investigating Large Quantities of Categorical Data". Applied Statistics. 29 (2): 119–127. doi:10.2307/2986296. JSTOR 2986296.
  2. ^ a b Biggs, David; De Ville, Barry; Suen, Ed (1991). "A method of choosing multiway partitions for classification and decision trees". Journal of Applied Statistics. 18 (1): 49–62. doi:10.1080/02664769100000005. ISSN 0266-4763.
  3. ^ a b Ritschard, Gilbert (2013). "CHAID and Earlier Supervised Tree Methods". Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences, McArdle, J.J. And G. Ritschard (Eds). New York: Routledge: 48–74.
  4. ^ Morgan, James N.; Sonquist, John A. (1963). "Problems in the Analysis of Survey Data, and a Proposal". Journal of the American Statistical Association. 58 (302): 415–434. doi:10.1080/01621459.1963.10500855. ISSN 0162-1459.
  5. ^ Messenger, Robert; Mandell, Lewis (1972). "A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis". Journal of the American Statistical Association. 67 (340): 768–772. doi:10.1080/01621459.1972.10481290. ISSN 0162-1459.
  6. ^ Morgan, James N. (1973). THAID, a sequential analysis program for the analysis of nominal scale dependent variables. Robert C. Messenger. Ann Arbor, Mich. ISBN 0-87944-137-2. OCLC 666930.{{cite book}}: CS1 maint: location missing publisher (link)
  7. ^ Belson, William A. (1959). "Matching and Prediction on the Principle of Biological Classification". Applied Statistics. 8 (2): 65–75. doi:10.2307/2985543. JSTOR 2985543.

Further reading edit

  • Press, Laurence I.; Rogers, Miles S.; & Shure, Gerald H.; An interactive technique for the analysis of multivariate data, Behavioral Science, Vol. 14 (1969), pp. 364–370
  • Hawkins, Douglas M.; and Kass, Gordon V.; Automatic Interaction Detection, in Hawkins, Douglas M. (ed), Topics in Applied Multivariate Analysis, Cambridge University Press, Cambridge, 1982, pp. 269–302
  • Hooton, Thomas M.; Haley, Robert W.; Culver, David H.; White, John W.; Morgan, W. Meade; & Carroll, Raymond J.; The Joint Associations of Multiple Risk Factors with the Occurrence of Nosocomial Infections, American Journal of Medicine, Vol. 70, (1981), pp. 960–970
  • Brink, Susanne; & Van Schalkwyk, Dirk J.; Serum ferritin and mean corpuscular volume as predictors of bone marrow iron stores, South African Medical Journal, Vol. 61, (1982), pp. 432–434
  • McKenzie, Dean P.; McGorry, Patrick D.; Wallace, Chris S.; Low, Lee H.; Copolov, David L.; & Singh, Bruce S.; Constructing a Minimal Diagnostic Decision Tree, Methods of Information in Medicine, Vol. 32 (1993), pp. 161–166
  • Magidson, Jay; The CHAID approach to segmentation modeling: chi-squared automatic interaction detection, in Bagozzi, Richard P. (ed); Advanced Methods of Marketing Research, Blackwell, Oxford, GB, 1994, pp. 118–159
  • Hawkins, Douglas M.; Young, S. S.; & Rosinko, A.; Analysis of a large structure-activity dataset using recursive partitioning, Quantitative Structure-Activity Relationships, Vol. 16, (1997), pp. 296–302

Software edit

  • Luchman, J.N.; CHAID: Stata module to conduct chi-square automated interaction detection, Available for free download, or type within Stata: ssc install chaid.
  • Luchman, J.N.; CHAIDFOREST: Stata module to conduct random forest ensemble classification based on chi-square automated interaction detection (CHAID) as base learner, Available for free download, or type within Stata: ssc install chaidforest.
  • IBM SPSS Decision Trees grows exhaustive CHAID trees as well as a few other types of trees such as CART.
  • An R package CHAID is available on R-Forge.

square, automatic, interaction, detection, chaid, decision, tree, technique, based, adjusted, significance, testing, bonferroni, correction, holm, bonferroni, testing, technique, developed, south, africa, published, 1980, gordon, kass, completed, thesis, this,. Chi square automatic interaction detection CHAID 1 2 3 is a decision tree technique based on adjusted significance testing Bonferroni correction Holm Bonferroni testing The technique was developed in South Africa and was published in 1980 by Gordon V Kass who had completed a PhD thesis on this topic CHAID can be used for prediction in a similar fashion to regression analysis this version of CHAID being originally known as XAID as well as classification and for detection of interaction between variables CHAID is based on a formal extension of AID Automatic Interaction Detection 4 and THAID THeta Automatic Interaction Detection 5 6 procedures of the 1960s and 1970s which in turn were extensions of earlier research including that performed by Belson in the UK in the 1950s 7 A history of earlier supervised tree methods together with a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs De Ville and Suen 2 can be found in Ritschard 3 In practice CHAID is often used in the context of direct marketing to select groups of consumers to predict how their responses to some variables affect other variables although other early applications were in the fields of medical and psychiatric research Like other decision trees CHAID s advantages are that its output is highly visual and easy to interpret Because it uses multiway splits by default it needs rather large sample sizes to work effectively since with small sample sizes the respondent groups can quickly become too small for reliable analysis One important advantage of CHAID over alternatives such as multiple regression is that it is non parametric Contents 1 See also 2 References 3 Further reading 4 SoftwareSee also editChi squared distribution Bonferroni correction Latent class model Structural equation modeling Market segment Decision tree learning Multiple comparisonsReferences edit Kass G V 1980 An Exploratory Technique for Investigating Large Quantities of Categorical Data Applied Statistics 29 2 119 127 doi 10 2307 2986296 JSTOR 2986296 a b Biggs David De Ville Barry Suen Ed 1991 A method of choosing multiway partitions for classification and decision trees Journal of Applied Statistics 18 1 49 62 doi 10 1080 02664769100000005 ISSN 0266 4763 a b Ritschard Gilbert 2013 CHAID and Earlier Supervised Tree Methods Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences McArdle J J And G Ritschard Eds New York Routledge 48 74 Morgan James N Sonquist John A 1963 Problems in the Analysis of Survey Data and a Proposal Journal of the American Statistical Association 58 302 415 434 doi 10 1080 01621459 1963 10500855 ISSN 0162 1459 Messenger Robert Mandell Lewis 1972 A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis Journal of the American Statistical Association 67 340 768 772 doi 10 1080 01621459 1972 10481290 ISSN 0162 1459 Morgan James N 1973 THAID a sequential analysis program for the analysis of nominal scale dependent variables Robert C Messenger Ann Arbor Mich ISBN 0 87944 137 2 OCLC 666930 a href Template Cite book html title Template Cite book cite book a CS1 maint location missing publisher link Belson William A 1959 Matching and Prediction on the Principle of Biological Classification Applied Statistics 8 2 65 75 doi 10 2307 2985543 JSTOR 2985543 Further reading editPress Laurence I Rogers Miles S amp Shure Gerald H An interactive technique for the analysis of multivariate data Behavioral Science Vol 14 1969 pp 364 370 Hawkins Douglas M and Kass Gordon V Automatic Interaction Detection in Hawkins Douglas M ed Topics in Applied Multivariate Analysis Cambridge University Press Cambridge 1982 pp 269 302 Hooton Thomas M Haley Robert W Culver David H White John W Morgan W Meade amp Carroll Raymond J The Joint Associations of Multiple Risk Factors with the Occurrence of Nosocomial Infections American Journal of Medicine Vol 70 1981 pp 960 970 Brink Susanne amp Van Schalkwyk Dirk J Serum ferritin and mean corpuscular volume as predictors of bone marrow iron stores South African Medical Journal Vol 61 1982 pp 432 434 McKenzie Dean P McGorry Patrick D Wallace Chris S Low Lee H Copolov David L amp Singh Bruce S Constructing a Minimal Diagnostic Decision Tree Methods of Information in Medicine Vol 32 1993 pp 161 166 Magidson Jay The CHAID approach to segmentation modeling chi squared automatic interaction detection in Bagozzi Richard P ed Advanced Methods of Marketing Research Blackwell Oxford GB 1994 pp 118 159 Hawkins Douglas M Young S S amp Rosinko A Analysis of a large structure activity dataset using recursive partitioning Quantitative Structure Activity Relationships Vol 16 1997 pp 296 302Software editLuchman J N CHAID Stata module to conduct chi square automated interaction detection Available for free download or type within Stata ssc install chaid Luchman J N CHAIDFOREST Stata module to conduct random forest ensemble classification based on chi square automated interaction detection CHAID as base learner Available for free download or type within Stata ssc install chaidforest IBM SPSS Decision Trees grows exhaustive CHAID trees as well as a few other types of trees such as CART An R package CHAID is available on R Forge Retrieved from https en wikipedia org w index php title Chi square automatic interaction detection amp oldid 1205152312, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.