fbpx
Wikipedia

Large width limits of neural networks

Artificial neural networks are a class of models used in machine learning, and inspired by biological neural networks. They are the core component of modern deep learning algorithms. Computation in artificial neural networks is usually organized into sequential layers of artificial neurons. The number of neurons in a layer is called the layer width. Theoretical analysis of artificial neural networks sometimes considers the limiting case that layer width becomes large or infinite. This limit enables simple analytic statements to be made about neural network predictions, training dynamics, generalization, and loss surfaces. This wide layer limit is also of practical interest, since finite width neural networks often perform strictly better as layer width is increased.[1][2][3][4][5][6]

Behavior of a neural network simplifies as it becomes infinitely wide. Left: a Bayesian neural network with two hidden layers, transforming a 3-dimensional input (bottom) into a two-dimensional output (top). Right: output probability density function induced by the random weights of the network. Video: as the width of the network increases, the output distribution simplifies, ultimately converging to a Neural network Gaussian process in the infinite width limit.

Theoretical approaches based on a large width limit edit

  • The Neural Network Gaussian Process (NNGP) corresponds to the infinite width limit of Bayesian neural networks, and to the distribution over functions realized by non-Bayesian neural networks after random initialization.[7][8][9][10]
  • The same underlying computations that are used to derive the NNGP kernel are also used in deep information propagation to characterize the propagation of information about gradients and inputs through a deep network.[11] This characterization is used to predict how model trainability depends on architecture and initializations hyper-parameters.
  • The Neural Tangent Kernel describes the evolution of neural network predictions during gradient descent training. In the infinite width limit the NTK usually becomes constant, often allowing closed form expressions for the function computed by a wide neural network throughout gradient descent training.[12] The training dynamics essentially become linearized.[13]
  • Mean-field limit analysis, when applied to neural networks with weight scaling of   instead of   and large enough learning rates, predicts qualitatively distinct nonlinear training dynamics compared to the static linear behavior described by the fixed neural tangent kernel, suggesting alternative pathways for understanding infinite-width networks.[14][15]
  • Catapult dynamics describe neural network training dynamics in the case that logits diverge to infinity as the layer width is taken to infinity, and describe qualitative properties of early training dynamics.[16]

References edit

  1. ^ Novak, Roman; Bahri, Yasaman; Abolafia, Daniel A.; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2018-02-15). "Sensitivity and Generalization in Neural Networks: an Empirical Study". International Conference on Learning Representations. arXiv:1802.08760. Bibcode:2018arXiv180208760N.
  2. ^ Canziani, Alfredo; Paszke, Adam; Culurciello, Eugenio (2016-11-04). "An Analysis of Deep Neural Network Models for Practical Applications". arXiv:1605.07678. Bibcode:2016arXiv160507678C. {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ Novak, Roman; Xiao, Lechao; Lee, Jaehoon; Bahri, Yasaman; Yang, Greg; Abolafia, Dan; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2018). "Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes". International Conference on Learning Representations. arXiv:1810.05148. Bibcode:2018arXiv181005148N.
  4. ^ Neyshabur, Behnam; Li, Zhiyuan; Bhojanapalli, Srinadh; LeCun, Yann; Srebro, Nathan (2019). "Towards understanding the role of over-parametrization in generalization of neural networks". International Conference on Learning Representations. arXiv:1805.12076. Bibcode:2018arXiv180512076N.
  5. ^ Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung (1996). "What size neural network gives optimal generalization? convergence properties of backpropagation". CiteSeerX 10.1.1.125.6019. {{cite journal}}: Cite journal requires |journal= (help)
  6. ^ Bartlett, P.L. (1998). "The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network". IEEE Transactions on Information Theory. 44 (2): 525–536. doi:10.1109/18.661502. ISSN 1557-9654.
  7. ^ Neal, Radford M. (1996), "Priors for Infinite Networks", Bayesian Learning for Neural Networks, Lecture Notes in Statistics, vol. 118, Springer New York, pp. 29–53, doi:10.1007/978-1-4612-0745-0_2, ISBN 978-0-387-94724-2
  8. ^ Lee, Jaehoon; Bahri, Yasaman; Novak, Roman; Schoenholz, Samuel S.; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2017). "Deep Neural Networks as Gaussian Processes". International Conference on Learning Representations. arXiv:1711.00165. Bibcode:2017arXiv171100165L.
  9. ^ G. de G. Matthews, Alexander; Rowland, Mark; Hron, Jiri; Turner, Richard E.; Ghahramani, Zoubin (2017). "Gaussian Process Behaviour in Wide Deep Neural Networks". International Conference on Learning Representations. arXiv:1804.11271. Bibcode:2018arXiv180411271M.
  10. ^ Hron, Jiri; Bahri, Yasaman; Novak, Roman; Pennington, Jeffrey; Sohl-Dickstein, Jascha (2020). "Exact posterior distributions of wide Bayesian neural networks". ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning. arXiv:2006.10541.
  11. ^ Schoenholz, Samuel S.; Gilmer, Justin; Ganguli, Surya; Sohl-Dickstein, Jascha (2016). "Deep information propagation". International Conference on Learning Representations. arXiv:1611.01232.
  12. ^ Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018). "Neural tangent kernel: Convergence and generalization in neural networks". Advances in Neural Information Processing Systems. arXiv:1806.07572.
  13. ^ Lee, Jaehoon; Xiao, Lechao; Schoenholz, Samuel S.; Bahri, Yasaman; Novak, Roman; Sohl-Dickstein, Jascha; Pennington, Jeffrey (2020). "Wide neural networks of any depth evolve as linear models under gradient descent". Journal of Statistical Mechanics: Theory and Experiment. 2020 (12): 124002. arXiv:1902.06720. Bibcode:2020JSMTE2020l4002L. doi:10.1088/1742-5468/abc62b. S2CID 62841516.
  14. ^ Mei, Song Montanari, Andrea Nguyen, Phan-Minh (2018-04-18). A Mean Field View of the Landscape of Two-Layers Neural Networks. OCLC 1106295873.{{cite book}}: CS1 maint: multiple names: authors list (link)
  15. ^ Nguyen, Phan-Minh; Pham, Huy Tuan (2020). "A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks". arXiv:2001.11443 [cs.LG].
  16. ^ Lewkowycz, Aitor; Bahri, Yasaman; Dyer, Ethan; Sohl-Dickstein, Jascha; Gur-Ari, Guy (2020). "The large learning rate phase of deep learning: the catapult mechanism". arXiv:2003.02218 [stat.ML].

large, width, limits, neural, networks, artificial, neural, networks, class, models, used, machine, learning, inspired, biological, neural, networks, they, core, component, modern, deep, learning, algorithms, computation, artificial, neural, networks, usually,. Artificial neural networks are a class of models used in machine learning and inspired by biological neural networks They are the core component of modern deep learning algorithms Computation in artificial neural networks is usually organized into sequential layers of artificial neurons The number of neurons in a layer is called the layer width Theoretical analysis of artificial neural networks sometimes considers the limiting case that layer width becomes large or infinite This limit enables simple analytic statements to be made about neural network predictions training dynamics generalization and loss surfaces This wide layer limit is also of practical interest since finite width neural networks often perform strictly better as layer width is increased 1 2 3 4 5 6 source source source source source source source source Behavior of a neural network simplifies as it becomes infinitely wide Left a Bayesian neural network with two hidden layers transforming a 3 dimensional input bottom into a two dimensional output y 1 y 2 displaystyle y 1 y 2 top Right output probability density function p y 1 y 2 displaystyle p y 1 y 2 induced by the random weights of the network Video as the width of the network increases the output distribution simplifies ultimately converging to a Neural network Gaussian process in the infinite width limit Contents 1 Theoretical approaches based on a large width limit 2 ReferencesTheoretical approaches based on a large width limit editThe Neural Network Gaussian Process NNGP corresponds to the infinite width limit of Bayesian neural networks and to the distribution over functions realized by non Bayesian neural networks after random initialization 7 8 9 10 The same underlying computations that are used to derive the NNGP kernel are also used in deep information propagation to characterize the propagation of information about gradients and inputs through a deep network 11 This characterization is used to predict how model trainability depends on architecture and initializations hyper parameters The Neural Tangent Kernel describes the evolution of neural network predictions during gradient descent training In the infinite width limit the NTK usually becomes constant often allowing closed form expressions for the function computed by a wide neural network throughout gradient descent training 12 The training dynamics essentially become linearized 13 Mean field limit analysis when applied to neural networks with weight scaling of 1 h displaystyle sim 1 h nbsp instead of 1 h displaystyle sim 1 sqrt h nbsp and large enough learning rates predicts qualitatively distinct nonlinear training dynamics compared to the static linear behavior described by the fixed neural tangent kernel suggesting alternative pathways for understanding infinite width networks 14 15 Catapult dynamics describe neural network training dynamics in the case that logits diverge to infinity as the layer width is taken to infinity and describe qualitative properties of early training dynamics 16 References edit Novak Roman Bahri Yasaman Abolafia Daniel A Pennington Jeffrey Sohl Dickstein Jascha 2018 02 15 Sensitivity and Generalization in Neural Networks an Empirical Study International Conference on Learning Representations arXiv 1802 08760 Bibcode 2018arXiv180208760N Canziani Alfredo Paszke Adam Culurciello Eugenio 2016 11 04 An Analysis of Deep Neural Network Models for Practical Applications arXiv 1605 07678 Bibcode 2016arXiv160507678C a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Novak Roman Xiao Lechao Lee Jaehoon Bahri Yasaman Yang Greg Abolafia Dan Pennington Jeffrey Sohl Dickstein Jascha 2018 Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes International Conference on Learning Representations arXiv 1810 05148 Bibcode 2018arXiv181005148N Neyshabur Behnam Li Zhiyuan Bhojanapalli Srinadh LeCun Yann Srebro Nathan 2019 Towards understanding the role of over parametrization in generalization of neural networks International Conference on Learning Representations arXiv 1805 12076 Bibcode 2018arXiv180512076N Lawrence Steve Giles C Lee Tsoi Ah Chung 1996 What size neural network gives optimal generalization convergence properties of backpropagation CiteSeerX 10 1 1 125 6019 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Bartlett P L 1998 The sample complexity of pattern classification with neural networks the size of the weights is more important than the size of the network IEEE Transactions on Information Theory 44 2 525 536 doi 10 1109 18 661502 ISSN 1557 9654 Neal Radford M 1996 Priors for Infinite Networks Bayesian Learning for Neural Networks Lecture Notes in Statistics vol 118 Springer New York pp 29 53 doi 10 1007 978 1 4612 0745 0 2 ISBN 978 0 387 94724 2 Lee Jaehoon Bahri Yasaman Novak Roman Schoenholz Samuel S Pennington Jeffrey Sohl Dickstein Jascha 2017 Deep Neural Networks as Gaussian Processes International Conference on Learning Representations arXiv 1711 00165 Bibcode 2017arXiv171100165L G de G Matthews Alexander Rowland Mark Hron Jiri Turner Richard E Ghahramani Zoubin 2017 Gaussian Process Behaviour in Wide Deep Neural Networks International Conference on Learning Representations arXiv 1804 11271 Bibcode 2018arXiv180411271M Hron Jiri Bahri Yasaman Novak Roman Pennington Jeffrey Sohl Dickstein Jascha 2020 Exact posterior distributions of wide Bayesian neural networks ICML 2020 Workshop on Uncertainty amp Robustness in Deep Learning arXiv 2006 10541 Schoenholz Samuel S Gilmer Justin Ganguli Surya Sohl Dickstein Jascha 2016 Deep information propagation International Conference on Learning Representations arXiv 1611 01232 Jacot Arthur Gabriel Franck Hongler Clement 2018 Neural tangent kernel Convergence and generalization in neural networks Advances in Neural Information Processing Systems arXiv 1806 07572 Lee Jaehoon Xiao Lechao Schoenholz Samuel S Bahri Yasaman Novak Roman Sohl Dickstein Jascha Pennington Jeffrey 2020 Wide neural networks of any depth evolve as linear models under gradient descent Journal of Statistical Mechanics Theory and Experiment 2020 12 124002 arXiv 1902 06720 Bibcode 2020JSMTE2020l4002L doi 10 1088 1742 5468 abc62b S2CID 62841516 Mei Song Montanari Andrea Nguyen Phan Minh 2018 04 18 A Mean Field View of the Landscape of Two Layers Neural Networks OCLC 1106295873 a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link Nguyen Phan Minh Pham Huy Tuan 2020 A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks arXiv 2001 11443 cs LG Lewkowycz Aitor Bahri Yasaman Dyer Ethan Sohl Dickstein Jascha Gur Ari Guy 2020 The large learning rate phase of deep learning the catapult mechanism arXiv 2003 02218 stat ML Retrieved from https en wikipedia org w index php title Large width limits of neural networks amp oldid 1203694178, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.