fbpx
Wikipedia

Learning curve (machine learning)

In machine learning, a learning curve (or training curve) plots the optimal value of a model's loss function for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function.[1] Synonyms include error curve, experience curve, improvement curve and generalization curve.[2]

Learning curve showing training score and cross validation score

More abstractly, the learning curve is a curve of (learning effort)-(predictive performance), where usually learning effort means number of training samples and predictive performance means accuracy on testing samples.[3]

The machine learning curve is useful for many purposes including comparing different algorithms,[4] choosing model parameters during design,[5] adjusting optimization to improve convergence, and determining the amount of data used for training.[6]

Formal definition edit

One model of a machine learning is producing a function, f(x), which given some information, x, predicts some variable, y, from training data   and  . It is distinct from mathematical optimization because   should predict well for   outside of  .

We often constrain the possible functions to a parameterized family of functions,  , so that our function is more generalizable[7] or so that the function has certain properties such as those that make finding a good   easier, or because we have some a priori reason to think that these properties are true.[7]: 172 

Given that it is not possible to produce a function that perfectly fits our data, it is then necessary to produce a loss function   to measure how good our prediction is. We then define an optimization process which finds a   which minimizes   referred to as   .

Training curve for amount of data edit

Then if our training data is   and our validation data is   a learning curve is the plot of the two curves

  1.  
  2.  

where  

Training curve for number of iterations edit

Many optimization processes are iterative, repeating the same step until the process converges to an optimal value. Gradient descent is one such algorithm. If you define   as the approximation of the optimal   after   steps, a learning curve is the plot of

  1.  
  2.  

Choosing the size of the training dataset edit

It is a tool to find out how much a machine model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error. If both the validation score and the training score converge to a value that is too low with increasing size of the training set, it will not benefit much from more training data.[8]

In the machine learning domain, there are two implications of learning curves differing in the x-axis of the curves, with experience of the model graphed either as the number of training examples used for learning or the number of iterations used in training the model.[9]

See also edit

References edit

  1. ^ "Mohr, Felix and van Rijn, Jan N. "Learning Curves for Decision Making in Supervised Machine Learning - A Survey." arXiv preprint arXiv:2201.12150 (2022)". arXiv:2201.12150.
  2. ^ Viering, Tom; Loog, Marco (2023-06-01). "The Shape of Learning Curves: A Review". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (6): 7799–7819. arXiv:2103.10948. doi:10.1109/TPAMI.2022.3220744. ISSN 0162-8828.
  3. ^ Perlich, Claudia (2010), Sammut, Claude; Webb, Geoffrey I. (eds.), "Learning Curves in Machine Learning", Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 577–580, doi:10.1007/978-0-387-30164-8_452, ISBN 978-0-387-30164-8, retrieved 2023-07-06
  4. ^ Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3.
  5. ^ "Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn.
  6. ^ Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). . Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15.
  7. ^ a b Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016-11-18). Deep Learning. MIT Press. p. 108. ISBN 978-0-262-03561-3.
  8. ^ scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019.
  9. ^ Sammut, Claude; Webb, Geoffrey I. (Eds.) (28 March 2011). Encyclopedia of Machine Learning (1st ed.). Springer. p. 578. ISBN 978-0-387-30768-8.

learning, curve, machine, learning, this, article, provides, insufficient, context, those, unfamiliar, with, subject, please, help, improve, article, providing, more, context, reader, march, 2019, learn, when, remove, this, template, message, machine, learning. This article provides insufficient context for those unfamiliar with the subject Please help improve the article by providing more context for the reader March 2019 Learn how and when to remove this template message In machine learning a learning curve or training curve plots the optimal value of a model s loss function for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function 1 Synonyms include error curve experience curve improvement curve and generalization curve 2 Learning curve showing training score and cross validation scoreMore abstractly the learning curve is a curve of learning effort predictive performance where usually learning effort means number of training samples and predictive performance means accuracy on testing samples 3 The machine learning curve is useful for many purposes including comparing different algorithms 4 choosing model parameters during design 5 adjusting optimization to improve convergence and determining the amount of data used for training 6 Contents 1 Formal definition 1 1 Training curve for amount of data 1 2 Training curve for number of iterations 2 Choosing the size of the training dataset 3 See also 4 ReferencesFormal definition editOne model of a machine learning is producing a function f x which given some information x predicts some variable y from training data Xtrain displaystyle X text train nbsp and Ytrain displaystyle Y text train nbsp It is distinct from mathematical optimization because f displaystyle f nbsp should predict well for x displaystyle x nbsp outside of Xtrain displaystyle X text train nbsp We often constrain the possible functions to a parameterized family of functions f8 x 8 8 displaystyle f theta x theta in Theta nbsp so that our function is more generalizable 7 or so that the function has certain properties such as those that make finding a good f displaystyle f nbsp easier or because we have some a priori reason to think that these properties are true 7 172 Given that it is not possible to produce a function that perfectly fits our data it is then necessary to produce a loss function L f8 X Y displaystyle L f theta X Y nbsp to measure how good our prediction is We then define an optimization process which finds a 8 displaystyle theta nbsp which minimizes L f8 X Y displaystyle L f theta X Y nbsp referred to as 8 X Y displaystyle theta X Y nbsp Training curve for amount of data edit Then if our training data is x1 x2 xn y1 y2 yn displaystyle x 1 x 2 dots x n y 1 y 2 dots y n nbsp and our validation data is x1 x2 xm y1 y2 ym displaystyle x 1 x 2 dots x m y 1 y 2 dots y m nbsp a learning curve is the plot of the two curves i L f8 Xi Yi Xi Yi displaystyle i mapsto L f theta X i Y i X i Y i nbsp i L f8 Xi Yi Xi Yi displaystyle i mapsto L f theta X i Y i X i Y i nbsp where Xi x1 x2 xi displaystyle X i x 1 x 2 dots x i nbsp Training curve for number of iterations edit Many optimization processes are iterative repeating the same step until the process converges to an optimal value Gradient descent is one such algorithm If you define 8i displaystyle theta i nbsp as the approximation of the optimal 8 displaystyle theta nbsp after i displaystyle i nbsp steps a learning curve is the plot of i L f8i X Y X Y displaystyle i mapsto L f theta i X Y X Y nbsp i L f8i X Y X Y displaystyle i mapsto L f theta i X Y X Y nbsp Choosing the size of the training dataset editIt is a tool to find out how much a machine model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error If both the validation score and the training score converge to a value that is too low with increasing size of the training set it will not benefit much from more training data 8 In the machine learning domain there are two implications of learning curves differing in the x axis of the curves with experience of the model graphed either as the number of training examples used for learning or the number of iterations used in training the model 9 See also editOverfitting Bias variance tradeoff Model selection Cross validation statistics Validity statistics Verification and validation Double descentReferences edit Mohr Felix and van Rijn Jan N Learning Curves for Decision Making in Supervised Machine Learning A Survey arXiv preprint arXiv 2201 12150 2022 arXiv 2201 12150 Viering Tom Loog Marco 2023 06 01 The Shape of Learning Curves A Review IEEE Transactions on Pattern Analysis and Machine Intelligence 45 6 7799 7819 arXiv 2103 10948 doi 10 1109 TPAMI 2022 3220744 ISSN 0162 8828 Perlich Claudia 2010 Sammut Claude Webb Geoffrey I eds Learning Curves in Machine Learning Encyclopedia of Machine Learning Boston MA Springer US pp 577 580 doi 10 1007 978 0 387 30164 8 452 ISBN 978 0 387 30164 8 retrieved 2023 07 06 Madhavan P G 1997 A New Recurrent Neural Network Learning Algorithm for Time Series Prediction PDF Journal of Intelligent Systems p 113 Fig 3 Machine Learning 102 Practical Advice Tutorial Machine Learning for Astronomy with Scikit learn Meek Christopher Thiesson Bo Heckerman David Summer 2002 The Learning Curve Sampling Method Applied to Model Based Clustering Journal of Machine Learning Research 2 3 397 Archived from the original on 2013 07 15 a b Goodfellow Ian Bengio Yoshua Courville Aaron 2016 11 18 Deep Learning MIT Press p 108 ISBN 978 0 262 03561 3 scikit learn developers Validation curves plotting scores to evaluate models scikit learn 0 20 2 documentation Retrieved February 15 2019 Sammut Claude Webb Geoffrey I Eds 28 March 2011 Encyclopedia of Machine Learning 1st ed Springer p 578 ISBN 978 0 387 30768 8 Retrieved from https en wikipedia org w index php title Learning curve machine learning amp oldid 1173486398, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.