fbpx
Wikipedia

Computational statistics

Computational statistics, or statistical computing, is the bond between statistics and computer science. It means statistical methods that are enabled by using computational methods. It is the area of computational science (or scientific computing) specific to the mathematical science of statistics. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.[1]

Students working in the Statistics Machine Room of the London School of Economics in 1964

As in traditional statistics the goal is to transform raw data into knowledge,[2] but the focus lies on computer intensive statistical methods, such as cases with very large sample size and non-homogeneous data sets.[2]

The terms 'computational statistics' and 'statistical computing' are often used interchangeably, although Carlo Lauro (a former president of the International Association for Statistical Computing) proposed making a distinction, defining 'statistical computing' as "the application of computer science to statistics", and 'computational statistics' as "aiming at the design of algorithm for implementing statistical methods on computers, including the ones unthinkable before the computer age (e.g. bootstrap, simulation), as well as to cope with analytically intractable problems" [sic].[3]

The term 'Computational statistics' may also be used to refer to computationally intensive statistical methods including resampling methods, Markov chain Monte Carlo methods, local regression, kernel density estimation, artificial neural networks and generalized additive models.

History

Though computational statistics is widely used today, it actually has a relatively short history of acceptance in the statistics community. For the most part, the founders of the field of statistics relied on mathematics and asymptotic approximations in the development of computational statistical methodology.[4]

In statistical field, the first use of the term “computer” comes in an article in the Journal of the American Statistical Association archives by Robert P. Porter in 1891. The article discusses about the use of Hermann Hollerith’s machine in the 11th Census of the United States.[5] Hermann Hollerith’s machine, also called tabulating machine, was an electromechanical machine designed to assist in summarizing information stored on punched cards. It was invented by Herman Hollerith (February 29, 1860 – November 17, 1929), an American businessman, inventor, and statistician. His invention of the punched card tabulating machine was patented in 1884, and later was used in the 1890 Census of the United States. The advantages of the technology were immediately apparent. the 1880 Census, with about 50 million people, and it took over 7 years to tabulate. While in the 1890 Census, with over 62 million people, it took less than a year. This marks the beginning of the era of mechanized computational statistics and semiautomatic data processing systems.

In 1908, William Sealy Gosset performed his now well-known Monte Carlo method simulation which led to the discovery of the Student’s t-distribution.[6] With the help of computational methods, he also has plots of the empirical distributions overlaid on the corresponding theoretical distributions. The computer has revolutionized simulation and has made the replication of Gosset’s experiment little more than an exercise.[7][8]

Later on, the scientists put forward computational ways of generating pseudo-random deviates, performed methods to convert uniform deviates into other distributional forms using inverse cumulative distribution function or acceptance-rejection methods, and developed state-space methodology for Markov chain Monte Carlo.[9] One of the first efforts to generate random digits in a fully automated way, was undertaken by the RAND Corporation in 1947. The tables produced were published as a book in 1955, and also as a series of punch cards.

By the mid-1950s, several articles and patents for devices have been proposed for random number generators.[10] The development of these devices were motivated from the need to use random digits to perform simulations and other fundamental components in statistical analysis. One of the most well known of such devices is ERNIE, which produces random numbers that determine the winners of the Premium Bond, a lottery bond issued in the United Kingdom. In 1958, John Tukey’s jackknife was developed. It is as a method to reduce the bias of parameter estimates in samples under nonstandard conditions.[11] This requires computers for practical implementations. To this point, computers have made many tedious statistical studies feasible.[12]


Methods

Maximum likelihood estimation

Maximum likelihood estimation is used to estimate the parameters of an assumed probability distribution, given some observed data. It is achieved by maximizing a likelihood function so that the observed data is most probable under the assumed statistical model.

Monte Carlo method

Monte Carlo a statistical method relies on repeated random sampling to obtain numerical results. The concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

Markov chain Monte Carlo

The Markov chain Monte Carlo method creates samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance.The more steps are included, the more closely the distribution of the sample matches the actual desired distribution.

Applications

Computational statistics journals

Associations

See also

References

  1. ^ Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula", The American Statistician 64 (2), pp.97-107.
  2. ^ a b Wegman, Edward J. “Computational Statistics: A New Agenda for Statistical Theory and Practice.” Journal of the Washington Academy of Sciences, vol. 78, no. 4, 1988, pp. 310–322. JSTOR
  3. ^ Lauro, Carlo (1996), "Computational statistics or statistical computing, is that the question?", Computational Statistics & Data Analysis, 23 (1): 191–193, doi:10.1016/0167-9473(96)88920-1
  4. ^ Watnik, Mitchell (2011). "Early Computational Statistics". Journal of Computational and Graphical Statistics. 20 (4): 811–817. doi:10.1198/jcgs.2011.204b. ISSN 1061-8600. S2CID 120111510.
  5. ^ Hendrickson, W. A.; Ward, K. B. (1975-10-27). "Atomic models for the polypeptide backbones of myohemerythrin and hemerythrin". Biochemical and Biophysical Research Communications. 66 (4): 1349–1356. doi:10.1016/0006-291x(75)90508-2. ISSN 1090-2104. PMID 5.
  6. ^ "Los Alamos science, Number 14". 1986-01-01. doi:10.2172/6935980. {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ Trahan, Travis John (2019-10-03). "Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory". doi:10.2172/1569710. OSTI 1569710. {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ Metropolis, Nicholas; Ulam, S. (1949). "The Monte Carlo Method". Journal of the American Statistical Association. 44 (247): 335–341. doi:10.1080/01621459.1949.10483310. ISSN 0162-1459. PMID 18139350.
  9. ^ Robert, Christian; Casella, George (2011-02-01). "A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data". Statistical Science. 26 (1). doi:10.1214/10-sts351. ISSN 0883-4237. S2CID 2806098.
  10. ^ https://hal.inria.fr/hal-01561551/document
  11. ^ QUENOUILLE, M. H. (1956). "Notes on Bias in Estimation". Biometrika. 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353. ISSN 0006-3444.
  12. ^ Teichroew, Daniel (1965). "A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation". Journal of the American Statistical Association. 60 (309): 27–49. doi:10.1080/01621459.1965.10480773. ISSN 0162-1459.

Further reading

Articles

  • Albert, J.H.; Gentle, J.E. (2004), Albert, James H; Gentle, James E (eds.), "Special Section: Teaching Computational Statistics", The American Statistician, 58: 1, doi:10.1198/0003130042872, S2CID 219596225
  • Wilkinson, Leland (2008), "The Future of Statistical Computing (with discussion)", Technometrics, 50 (4): 418–435, doi:10.1198/004017008000000460, S2CID 3521989

Books

  • Drew, John H.; Evans, Diane L.; Glen, Andrew G.; Lemis, Lawrence M. (2007), Computational Probability: Algorithms and Applications in the Mathematical Sciences, Springer International Series in Operations Research & Management Science, Springer, ISBN 978-0-387-74675-3
  • Gentle, James E. (2002), Elements of Computational Statistics, Springer, ISBN 0-387-95489-9
  • Gentle, James E.; Härdle, Wolfgang; Mori, Yuichi, eds. (2004), Handbook of Computational Statistics: Concepts and Methods, Springer, ISBN 3-540-40464-3
  • Givens, Geof H.; Hoeting, Jennifer A. (2005), Computational Statistics, Wiley Series in Probability and Statistics, Wiley-Interscience, ISBN 978-0-471-46124-1
  • Klemens, Ben (2008), Modeling with Data: Tools and Techniques for Statistical Computing, Princeton University Press, ISBN 978-0-691-13314-0
  • Monahan, John (2001), Numerical Methods of Statistics, Cambridge University Press, ISBN 978-0-521-79168-7
  • Rose, Colin; Smith, Murray D. (2002), Mathematical Statistics with Mathematica, Springer Texts in Statistics, Springer, ISBN 0-387-95234-9
  • Thisted, Ronald Aaron (1988), Elements of Statistical Computing: Numerical Computation, CRC Press, ISBN 0-412-01371-1
  • Gharieb, Reda. R. (2017), Data Science: Scientific and Statistical Computing, Noor Publishing, ISBN 978-3-330-97256-8

External links

Associations

  • International Association for Statistical Computing
  • Statistical Computing section of the American Statistical Association

Journals

  • Computational Statistics & Data Analysis
  • Statistics and Computing

computational, statistics, journal, computational, statistics, journal, statistical, computing, bond, between, statistics, computer, science, means, statistical, methods, that, enabled, using, computational, methods, area, computational, science, scientific, c. For the journal see Computational Statistics journal Computational statistics or statistical computing is the bond between statistics and computer science It means statistical methods that are enabled by using computational methods It is the area of computational science or scientific computing specific to the mathematical science of statistics This area is also developing rapidly leading to calls that a broader concept of computing should be taught as part of general statistical education 1 Students working in the Statistics Machine Room of the London School of Economics in 1964 As in traditional statistics the goal is to transform raw data into knowledge 2 but the focus lies on computer intensive statistical methods such as cases with very large sample size and non homogeneous data sets 2 The terms computational statistics and statistical computing are often used interchangeably although Carlo Lauro a former president of the International Association for Statistical Computing proposed making a distinction defining statistical computing as the application of computer science to statistics and computational statistics as aiming at the design of algorithm for implementing statistical methods on computers including the ones unthinkable before the computer age e g bootstrap simulation as well as to cope with analytically intractable problems sic 3 The term Computational statistics may also be used to refer to computationally intensive statistical methods including resampling methods Markov chain Monte Carlo methods local regression kernel density estimation artificial neural networks and generalized additive models Contents 1 History 2 Methods 2 1 Maximum likelihood estimation 2 2 Monte Carlo method 2 3 Markov chain Monte Carlo 3 Applications 4 Computational statistics journals 5 Associations 6 See also 7 References 8 Further reading 8 1 Articles 8 2 Books 9 External links 9 1 Associations 9 2 JournalsHistory EditThough computational statistics is widely used today it actually has a relatively short history of acceptance in the statistics community For the most part the founders of the field of statistics relied on mathematics and asymptotic approximations in the development of computational statistical methodology 4 In statistical field the first use of the term computer comes in an article in the Journal of the American Statistical Association archives by Robert P Porter in 1891 The article discusses about the use of Hermann Hollerith s machine in the 11th Census of the United States 5 Hermann Hollerith s machine also called tabulating machine was an electromechanical machine designed to assist in summarizing information stored on punched cards It was invented by Herman Hollerith February 29 1860 November 17 1929 an American businessman inventor and statistician His invention of the punched card tabulating machine was patented in 1884 and later was used in the 1890 Census of the United States The advantages of the technology were immediately apparent the 1880 Census with about 50 million people and it took over 7 years to tabulate While in the 1890 Census with over 62 million people it took less than a year This marks the beginning of the era of mechanized computational statistics and semiautomatic data processing systems In 1908 William Sealy Gosset performed his now well known Monte Carlo method simulation which led to the discovery of the Student s t distribution 6 With the help of computational methods he also has plots of the empirical distributions overlaid on the corresponding theoretical distributions The computer has revolutionized simulation and has made the replication of Gosset s experiment little more than an exercise 7 8 Later on the scientists put forward computational ways of generating pseudo random deviates performed methods to convert uniform deviates into other distributional forms using inverse cumulative distribution function or acceptance rejection methods and developed state space methodology for Markov chain Monte Carlo 9 One of the first efforts to generate random digits in a fully automated way was undertaken by the RAND Corporation in 1947 The tables produced were published as a book in 1955 and also as a series of punch cards By the mid 1950s several articles and patents for devices have been proposed for random number generators 10 The development of these devices were motivated from the need to use random digits to perform simulations and other fundamental components in statistical analysis One of the most well known of such devices is ERNIE which produces random numbers that determine the winners of the Premium Bond a lottery bond issued in the United Kingdom In 1958 John Tukey s jackknife was developed It is as a method to reduce the bias of parameter estimates in samples under nonstandard conditions 11 This requires computers for practical implementations To this point computers have made many tedious statistical studies feasible 12 Methods EditMaximum likelihood estimation Edit Maximum likelihood estimation is used to estimate the parameters of an assumed probability distribution given some observed data It is achieved by maximizing a likelihood function so that the observed data is most probable under the assumed statistical model Monte Carlo method Edit Monte Carlo a statistical method relies on repeated random sampling to obtain numerical results The concept is to use randomness to solve problems that might be deterministic in principle They are often used in physical and mathematical problems and are most useful when it is difficult to use other approaches Monte Carlo methods are mainly used in three problem classes optimization numerical integration and generating draws from a probability distribution Markov chain Monte Carlo Edit The Markov chain Monte Carlo method creates samples from a continuous random variable with probability density proportional to a known function These samples can be used to evaluate an integral over that variable as its expected value or variance The more steps are included the more closely the distribution of the sample matches the actual desired distribution Applications EditComputational biology Computational linguistics Computational physics Computational mathematics Computational materials scienceComputational statistics journals EditCommunications in Statistics Simulation and Computation Computational Statistics Computational Statistics amp Data Analysis Journal of Computational and Graphical Statistics Journal of Statistical Computation and Simulation Journal of Statistical Software The R Journal The Stata Journal Statistics and Computing Wiley Interdisciplinary Reviews Computational StatisticsAssociations EditInternational Association for Statistical ComputingSee also EditAlgorithms for statistical classification Data science Statistical methods in artificial intelligence Free statistical software List of statistical algorithms List of statistical packages Machine learningReferences Edit Nolan D amp Temple Lang D 2010 Computing in the Statistics Curricula The American Statistician 64 2 pp 97 107 a b Wegman Edward J Computational Statistics A New Agenda for Statistical Theory and Practice Journal of the Washington Academy of Sciences vol 78 no 4 1988 pp 310 322 JSTOR Lauro Carlo 1996 Computational statistics or statistical computing is that the question Computational Statistics amp Data Analysis 23 1 191 193 doi 10 1016 0167 9473 96 88920 1 Watnik Mitchell 2011 Early Computational Statistics Journal of Computational and Graphical Statistics 20 4 811 817 doi 10 1198 jcgs 2011 204b ISSN 1061 8600 S2CID 120111510 Hendrickson W A Ward K B 1975 10 27 Atomic models for the polypeptide backbones of myohemerythrin and hemerythrin Biochemical and Biophysical Research Communications 66 4 1349 1356 doi 10 1016 0006 291x 75 90508 2 ISSN 1090 2104 PMID 5 Los Alamos science Number 14 1986 01 01 doi 10 2172 6935980 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Trahan Travis John 2019 10 03 Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory doi 10 2172 1569710 OSTI 1569710 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Metropolis Nicholas Ulam S 1949 The Monte Carlo Method Journal of the American Statistical Association 44 247 335 341 doi 10 1080 01621459 1949 10483310 ISSN 0162 1459 PMID 18139350 Robert Christian Casella George 2011 02 01 A Short History of Markov Chain Monte Carlo Subjective Recollections from Incomplete Data Statistical Science 26 1 doi 10 1214 10 sts351 ISSN 0883 4237 S2CID 2806098 https hal inria fr hal 01561551 document QUENOUILLE M H 1956 Notes on Bias in Estimation Biometrika 43 3 4 353 360 doi 10 1093 biomet 43 3 4 353 ISSN 0006 3444 Teichroew Daniel 1965 A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation Journal of the American Statistical Association 60 309 27 49 doi 10 1080 01621459 1965 10480773 ISSN 0162 1459 Further reading EditArticles Edit Albert J H Gentle J E 2004 Albert James H Gentle James E eds Special Section Teaching Computational Statistics The American Statistician 58 1 doi 10 1198 0003130042872 S2CID 219596225 Wilkinson Leland 2008 The Future of Statistical Computing with discussion Technometrics 50 4 418 435 doi 10 1198 004017008000000460 S2CID 3521989Books Edit Drew John H Evans Diane L Glen Andrew G Lemis Lawrence M 2007 Computational Probability Algorithms and Applications in the Mathematical Sciences Springer International Series in Operations Research amp Management Science Springer ISBN 978 0 387 74675 3 Gentle James E 2002 Elements of Computational Statistics Springer ISBN 0 387 95489 9 Gentle James E Hardle Wolfgang Mori Yuichi eds 2004 Handbook of Computational Statistics Concepts and Methods Springer ISBN 3 540 40464 3 Givens Geof H Hoeting Jennifer A 2005 Computational Statistics Wiley Series in Probability and Statistics Wiley Interscience ISBN 978 0 471 46124 1 Klemens Ben 2008 Modeling with Data Tools and Techniques for Statistical Computing Princeton University Press ISBN 978 0 691 13314 0 Monahan John 2001 Numerical Methods of Statistics Cambridge University Press ISBN 978 0 521 79168 7 Rose Colin Smith Murray D 2002 Mathematical Statistics with Mathematica Springer Texts in Statistics Springer ISBN 0 387 95234 9 Thisted Ronald Aaron 1988 Elements of Statistical Computing Numerical Computation CRC Press ISBN 0 412 01371 1 Gharieb Reda R 2017 Data Science Scientific and Statistical Computing Noor Publishing ISBN 978 3 330 97256 8External links EditAssociations Edit International Association for Statistical Computing Statistical Computing section of the American Statistical AssociationJournals Edit Computational Statistics amp Data Analysis Journal of Computational amp Graphical Statistics Statistics and Computing Retrieved from https en wikipedia org w index php title Computational statistics amp oldid 1135498582, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.