fbpx
Wikipedia

Multivariate statistics

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

In addition, multivariate statistics is concerned with multivariate probability distributions, in terms of both

  • how these can be used to represent the distributions of observed data;
  • how they can be used as part of statistical inference, particularly where several different quantities are of interest to the same analysis.

Certain types of problems involving multivariate data, for example simple linear regression and multiple regression, are not usually considered to be special cases of multivariate statistics because the analysis is dealt with by considering the (univariate) conditional distribution of a single outcome variable given the other variables.

Multivariate analysis

Multivariate analysis (MVA) is based on the principles of multivariate statistics. Typically, MVA is used to address the situations where multiple measurements are made on each experimental unit and the relations among these measurements and their structures are important.[1] A modern, overlapping categorization of MVA includes:[1]

  • Normal and general multivariate models and distribution theory
  • The study and measurement of relationships
  • Probability computations of multidimensional regions
  • The exploration of data structures and patterns

Multivariate analysis can be complicated by the desire to include physics-based analysis to calculate the effects of variables for a hierarchical "system-of-systems". Often, studies that wish to use multivariate analysis are stalled by the dimensionality of the problem. These concerns are often eased through the use of surrogate models, highly accurate approximations of the physics-based code. Since surrogate models take the form of an equation, they can be evaluated very quickly. This becomes an enabler for large-scale MVA studies: while a Monte Carlo simulation across the design space is difficult with physics-based codes, it becomes trivial when evaluating surrogate models, which often take the form of response-surface equations.

Types of analysis

There are many different models, each with its own type of analysis:

  1. Multivariate analysis of variance (MANOVA) extends the analysis of variance to cover cases where there is more than one dependent variable to be analyzed simultaneously; see also Multivariate analysis of covariance (MANCOVA).
  2. Multivariate regression attempts to determine a formula that can describe how elements in a vector of variables respond simultaneously to changes in others. For linear relations, regression analyses here are based on forms of the general linear model. Some suggest that multivariate regression is distinct from multivariable regression, however, that is debated and not consistently true across scientific fields.[2]
  3. Principal components analysis (PCA) creates a new set of orthogonal variables that contain the same information as the original set. It rotates the axes of variation to give a new set of orthogonal axes, ordered so that they summarize decreasing proportions of the variation.
  4. Factor analysis is similar to PCA but allows the user to extract a specified number of synthetic variables, fewer than the original set, leaving the remaining unexplained variation as error. The extracted variables are known as latent variables or factors; each one may be supposed to account for covariation in a group of observed variables.
  5. Canonical correlation analysis finds linear relationships among two sets of variables; it is the generalised (i.e. canonical) version of bivariate[3] correlation.
  6. Redundancy analysis (RDA) is similar to canonical correlation analysis but allows the user to derive a specified number of synthetic variables from one set of (independent) variables that explain as much variance as possible in another (independent) set. It is a multivariate analogue of regression.
  7. Correspondence analysis (CA), or reciprocal averaging, finds (like PCA) a set of synthetic variables that summarise the original set. The underlying model assumes chi-squared dissimilarities among records (cases).
  8. Canonical (or "constrained") correspondence analysis (CCA) for summarising the joint variation in two sets of variables (like redundancy analysis); combination of correspondence analysis and multivariate regression analysis. The underlying model assumes chi-squared dissimilarities among records (cases).
  9. Multidimensional scaling comprises various algorithms to determine a set of synthetic variables that best represent the pairwise distances between records. The original method is principal coordinates analysis (PCoA; based on PCA).
  10. Discriminant analysis, or canonical variate analysis, attempts to establish whether a set of variables can be used to distinguish between two or more groups of cases.
  11. Linear discriminant analysis (LDA) computes a linear predictor from two sets of normally distributed data to allow for classification of new observations.
  12. Clustering systems assign objects into groups (called clusters) so that objects (cases) from the same cluster are more similar to each other than objects from different clusters.
  13. Recursive partitioning creates a decision tree that attempts to correctly classify members of the population based on a dichotomous dependent variable.
  14. Artificial neural networks extend regression and clustering methods to non-linear multivariate models.
  15. Statistical graphics such as tours, parallel coordinate plots, scatterplot matrices can be used to explore multivariate data.
  16. Simultaneous equations models involve more than one regression equation, with different dependent variables, estimated together.
  17. Vector autoregression involves simultaneous regressions of various time series variables on their own and each other's lagged values.
  18. Principal response curves analysis (PRC) is a method based on RDA that allows the user to focus on treatment effects over time by correcting for changes in control treatments over time.[4]
  19. Iconography of correlations consists in replacing a correlation matrix by a diagram where the “remarkable” correlations are represented by a solid line (positive correlation), or a dotted line (negative correlation).

Important probability distributions

There is a set of probability distributions used in multivariate analyses that play a similar role to the corresponding set of distributions that are used in univariate analysis when the normal distribution is appropriate to a dataset. These multivariate distributions are:

The Inverse-Wishart distribution is important in Bayesian inference, for example in Bayesian multivariate linear regression. Additionally, Hotelling's T-squared distribution is a multivariate distribution, generalising Student's t-distribution, that is used in multivariate hypothesis testing.

History

Anderson's 1958 textbook, An Introduction to Multivariate Statistical Analysis,[5] educated a generation of theorists and applied statisticians; Anderson's book emphasizes hypothesis testing via likelihood ratio tests and the properties of power functions: admissibility, unbiasedness and monotonicity.[6][7]

MVA once solely stood in the statistical theory realms due to the size, complexity of underlying data set and high computational consumption. With the dramatic growth of computational power, MVA now plays an increasingly important role in data analysis and has wide application in OMICS fields.

Applications

Software and tools

There are an enormous number of software packages and other tools for multivariate analysis, including:


See also

References

  1. ^ a b Olkin, I.; Sampson, A. R. (2001-01-01), "Multivariate Analysis: Overview", in Smelser, Neil J.; Baltes, Paul B. (eds.), International Encyclopedia of the Social & Behavioral Sciences, Pergamon, pp. 10240–10247, ISBN 9780080430768, retrieved 2019-09-02
  2. ^ Hidalgo, B; Goodman, M (2013). "Multivariate or multivariable regression?". Am J Public Health. 103: 39–40. doi:10.2105/AJPH.2012.300897. PMC 3518362. PMID 23153131.
  3. ^ Unsophisticated analysts of bivariate Gaussian problems may find useful a crude but accurate method of accurately gauging probability by simply taking the sum S of the N residuals' squares, subtracting the sum Sm at minimum, dividing this difference by Sm, multiplying the result by (N - 2) and taking the inverse anti-ln of half that product.
  4. ^ ter Braak, Cajo J.F. & Šmilauer, Petr (2012). Canoco reference manual and user's guide: software for ordination (version 5.0), p292. Microcomputer Power, Ithaca, NY.
  5. ^ T.W. Anderson (1958) An Introduction to Multivariate Analysis, New York: Wiley ISBN 0471026409; 2e (1984) ISBN 0471889873; 3e (2003) ISBN 0471360910
  6. ^ Sen, Pranab Kumar; Anderson, T. W.; Arnold, S. F.; Eaton, M. L.; Giri, N. C.; Gnanadesikan, R.; Kendall, M. G.; Kshirsagar, A. M.; et al. (June 1986). "Review: Contemporary Textbooks on Multivariate Statistical Analysis: A Panoramic Appraisal and Critique". Journal of the American Statistical Association. 81 (394): 560–564. doi:10.2307/2289251. ISSN 0162-1459. JSTOR 2289251.(Pages 560–561)
  7. ^ Schervish, Mark J. (November 1987). "A Review of Multivariate Analysis". Statistical Science. 2 (4): 396–413. doi:10.1214/ss/1177013111. ISSN 0883-4237. JSTOR 2245530.
  8. ^ CRAN has details on the packages available for multivariate data analysis

Further reading

  • Johnson, Richard A.; Wichern, Dean W. (2007). Applied Multivariate Statistical Analysis (Sixth ed.). Prentice Hall. ISBN 978-0-13-187715-3.
  • KV Mardia; JT Kent; JM Bibby (1979). Multivariate Analysis. Academic Press. ISBN 0-12-471252-5.
  • A. Sen, M. Srivastava, Regression Analysis — Theory, Methods, and Applications, Springer-Verlag, Berlin, 2011 (4th printing).
  • Cook, Swayne (2007). Interactive Graphics for Data Analysis.
  • Malakooti, B. (2013). Operations and Production Systems with Multiple Objectives. John Wiley & Sons.
  • T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, New York, 1958.
  • KV Mardia; JT Kent & JM Bibby (1979). Multivariate Analysis. Academic Press. ISBN 978-0124712522. (M.A. level "likelihood" approach)
  • Feinstein, A. R. (1996) Multivariable Analysis. New Haven, CT: Yale University Press.
  • Hair, J. F. Jr. (1995) Multivariate Data Analysis with Readings, 4th ed. Prentice-Hall.
  • Johnson, Richard A.; Wichern, Dean W. (2007). Applied Multivariate Statistical Analysis (Sixth ed.). Prentice Hall. ISBN 978-0-13-187715-3.
  • Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data. CRC Press. (Advanced)
  • Sharma, S. (1996) Applied Multivariate Techniques. Wiley. (Informal, applied)
  • Izenman, Alan J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer Texts in Statistics. New York: Springer-Verlag. ISBN 9780387781884.
  • "Handbook of Applied Multivariate Statistics and Mathematical Modeling | ScienceDirect". Retrieved 2019-09-03.

External links

  • Statnotes: Topics in Multivariate Analysis, by G. David Garson
  • Mike Palmer: The Ordination Web Page
  • InsightsNow: Makers of ReportsNow, ProfilesNow, and KnowledgeNow

multivariate, statistics, multivariate, analysis, redirects, here, usage, mathematics, multivariable, calculus, subdivision, statistics, encompassing, simultaneous, observation, analysis, more, than, outcome, variable, concerns, understanding, different, aims,. Multivariate analysis redirects here For the usage in mathematics see Multivariable calculus Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis and how they relate to each other The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied In addition multivariate statistics is concerned with multivariate probability distributions in terms of both how these can be used to represent the distributions of observed data how they can be used as part of statistical inference particularly where several different quantities are of interest to the same analysis Certain types of problems involving multivariate data for example simple linear regression and multiple regression are not usually considered to be special cases of multivariate statistics because the analysis is dealt with by considering the univariate conditional distribution of a single outcome variable given the other variables Contents 1 Multivariate analysis 1 1 Types of analysis 2 Important probability distributions 3 History 4 Applications 5 Software and tools 6 See also 7 References 8 Further reading 9 External linksMultivariate analysis EditMultivariate analysis MVA is based on the principles of multivariate statistics Typically MVA is used to address the situations where multiple measurements are made on each experimental unit and the relations among these measurements and their structures are important 1 A modern overlapping categorization of MVA includes 1 Normal and general multivariate models and distribution theory The study and measurement of relationships Probability computations of multidimensional regions The exploration of data structures and patternsMultivariate analysis can be complicated by the desire to include physics based analysis to calculate the effects of variables for a hierarchical system of systems Often studies that wish to use multivariate analysis are stalled by the dimensionality of the problem These concerns are often eased through the use of surrogate models highly accurate approximations of the physics based code Since surrogate models take the form of an equation they can be evaluated very quickly This becomes an enabler for large scale MVA studies while a Monte Carlo simulation across the design space is difficult with physics based codes it becomes trivial when evaluating surrogate models which often take the form of response surface equations Types of analysis Edit There are many different models each with its own type of analysis Multivariate analysis of variance MANOVA extends the analysis of variance to cover cases where there is more than one dependent variable to be analyzed simultaneously see also Multivariate analysis of covariance MANCOVA Multivariate regression attempts to determine a formula that can describe how elements in a vector of variables respond simultaneously to changes in others For linear relations regression analyses here are based on forms of the general linear model Some suggest that multivariate regression is distinct from multivariable regression however that is debated and not consistently true across scientific fields 2 Principal components analysis PCA creates a new set of orthogonal variables that contain the same information as the original set It rotates the axes of variation to give a new set of orthogonal axes ordered so that they summarize decreasing proportions of the variation Factor analysis is similar to PCA but allows the user to extract a specified number of synthetic variables fewer than the original set leaving the remaining unexplained variation as error The extracted variables are known as latent variables or factors each one may be supposed to account for covariation in a group of observed variables Canonical correlation analysis finds linear relationships among two sets of variables it is the generalised i e canonical version of bivariate 3 correlation Redundancy analysis RDA is similar to canonical correlation analysis but allows the user to derive a specified number of synthetic variables from one set of independent variables that explain as much variance as possible in another independent set It is a multivariate analogue of regression Correspondence analysis CA or reciprocal averaging finds like PCA a set of synthetic variables that summarise the original set The underlying model assumes chi squared dissimilarities among records cases Canonical or constrained correspondence analysis CCA for summarising the joint variation in two sets of variables like redundancy analysis combination of correspondence analysis and multivariate regression analysis The underlying model assumes chi squared dissimilarities among records cases Multidimensional scaling comprises various algorithms to determine a set of synthetic variables that best represent the pairwise distances between records The original method is principal coordinates analysis PCoA based on PCA Discriminant analysis or canonical variate analysis attempts to establish whether a set of variables can be used to distinguish between two or more groups of cases Linear discriminant analysis LDA computes a linear predictor from two sets of normally distributed data to allow for classification of new observations Clustering systems assign objects into groups called clusters so that objects cases from the same cluster are more similar to each other than objects from different clusters Recursive partitioning creates a decision tree that attempts to correctly classify members of the population based on a dichotomous dependent variable Artificial neural networks extend regression and clustering methods to non linear multivariate models Statistical graphics such as tours parallel coordinate plots scatterplot matrices can be used to explore multivariate data Simultaneous equations models involve more than one regression equation with different dependent variables estimated together Vector autoregression involves simultaneous regressions of various time series variables on their own and each other s lagged values Principal response curves analysis PRC is a method based on RDA that allows the user to focus on treatment effects over time by correcting for changes in control treatments over time 4 Iconography of correlations consists in replacing a correlation matrix by a diagram where the remarkable correlations are represented by a solid line positive correlation or a dotted line negative correlation Important probability distributions EditThere is a set of probability distributions used in multivariate analyses that play a similar role to the corresponding set of distributions that are used in univariate analysis when the normal distribution is appropriate to a dataset These multivariate distributions are Multivariate normal distribution Wishart distribution Multivariate Student t distribution The Inverse Wishart distribution is important in Bayesian inference for example in Bayesian multivariate linear regression Additionally Hotelling s T squared distribution is a multivariate distribution generalising Student s t distribution that is used in multivariate hypothesis testing History EditAnderson s 1958 textbook An Introduction to Multivariate Statistical Analysis 5 educated a generation of theorists and applied statisticians Anderson s book emphasizes hypothesis testing via likelihood ratio tests and the properties of power functions admissibility unbiasedness and monotonicity 6 7 MVA once solely stood in the statistical theory realms due to the size complexity of underlying data set and high computational consumption With the dramatic growth of computational power MVA now plays an increasingly important role in data analysis and has wide application in OMICS fields Applications EditMultivariate hypothesis testing Dimensionality reduction Latent structure discovery Clustering Multivariate regression analysis Classification and discrimination analysis Variable selection Multidimensional analysis Multidimensional scaling Data miningSoftware and tools EditThere are an enormous number of software packages and other tools for multivariate analysis including JMP statistical software MiniTab Calc PSPP R 8 SAS software SciPy for Python SPSS Stata STATISTICA The Unscrambler WarpPLS SmartPLS MATLAB Eviews NCSS statistical software includes multivariate analysis The Unscrambler X is a multivariate analysis tool SIMCA DataPandit Free SaaS applications by Let s Excel Analytics Solutions See also EditEstimation of covariance matrices Important publications in multivariate analysis Multivariate testing in marketing Structured data analysis statistics Structural equation modeling RV coefficient Bivariate analysis Design of experiments DoE Dimensional analysis Exploratory data analysis OLS Partial least squares regression Pattern recognition Principal component analysis PCA Regression analysis Soft independent modelling of class analogies SIMCA Statistical interference Univariate analysisReferences Edit a b Olkin I Sampson A R 2001 01 01 Multivariate Analysis Overview in Smelser Neil J Baltes Paul B eds International Encyclopedia of the Social amp Behavioral Sciences Pergamon pp 10240 10247 ISBN 9780080430768 retrieved 2019 09 02 Hidalgo B Goodman M 2013 Multivariate or multivariable regression Am J Public Health 103 39 40 doi 10 2105 AJPH 2012 300897 PMC 3518362 PMID 23153131 Unsophisticated analysts of bivariate Gaussian problems may find useful a crude but accurate method of accurately gauging probability by simply taking the sum S of the N residuals squares subtracting the sum Sm at minimum dividing this difference by Sm multiplying the result by N 2 and taking the inverse anti ln of half that product ter Braak Cajo J F amp Smilauer Petr 2012 Canoco reference manual and user s guide software for ordination version 5 0 p292 Microcomputer Power Ithaca NY T W Anderson 1958 An Introduction to Multivariate Analysis New York Wiley ISBN 0471026409 2e 1984 ISBN 0471889873 3e 2003 ISBN 0471360910 Sen Pranab Kumar Anderson T W Arnold S F Eaton M L Giri N C Gnanadesikan R Kendall M G Kshirsagar A M et al June 1986 Review Contemporary Textbooks on Multivariate Statistical Analysis A Panoramic Appraisal and Critique Journal of the American Statistical Association 81 394 560 564 doi 10 2307 2289251 ISSN 0162 1459 JSTOR 2289251 Pages 560 561 Schervish Mark J November 1987 A Review of Multivariate Analysis Statistical Science 2 4 396 413 doi 10 1214 ss 1177013111 ISSN 0883 4237 JSTOR 2245530 CRAN has details on the packages available for multivariate data analysisFurther reading EditJohnson Richard A Wichern Dean W 2007 Applied Multivariate Statistical Analysis Sixth ed Prentice Hall ISBN 978 0 13 187715 3 KV Mardia JT Kent JM Bibby 1979 Multivariate Analysis Academic Press ISBN 0 12 471252 5 A Sen M Srivastava Regression Analysis Theory Methods and Applications Springer Verlag Berlin 2011 4th printing Cook Swayne 2007 Interactive Graphics for Data Analysis Malakooti B 2013 Operations and Production Systems with Multiple Objectives John Wiley amp Sons T W Anderson An Introduction to Multivariate Statistical Analysis Wiley New York 1958 KV Mardia JT Kent amp JM Bibby 1979 Multivariate Analysis Academic Press ISBN 978 0124712522 M A level likelihood approach Feinstein A R 1996 Multivariable Analysis New Haven CT Yale University Press Hair J F Jr 1995 Multivariate Data Analysis with Readings 4th ed Prentice Hall Johnson Richard A Wichern Dean W 2007 Applied Multivariate Statistical Analysis Sixth ed Prentice Hall ISBN 978 0 13 187715 3 Schafer J L 1997 Analysis of Incomplete Multivariate Data CRC Press Advanced Sharma S 1996 Applied Multivariate Techniques Wiley Informal applied Izenman Alan J 2008 Modern Multivariate Statistical Techniques Regression Classification and Manifold Learning Springer Texts in Statistics New York Springer Verlag ISBN 9780387781884 Handbook of Applied Multivariate Statistics and Mathematical Modeling ScienceDirect Retrieved 2019 09 03 External links Edit Wikimedia Commons has media related to Multivariate statistics Statnotes Topics in Multivariate Analysis by G David Garson Mike Palmer The Ordination Web Page InsightsNow Makers of ReportsNow ProfilesNow and KnowledgeNow Portal Mathematics Retrieved from https en wikipedia org w index php title Multivariate statistics amp oldid 1104908843, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.