This article may be in need of reorganization to comply with Wikipedia's layout guidelines. The reason given is: Inconsistent use of variable names and terminology without images to match. (August 2022) |
Part of a series on |
Machine learning and data mining |
---|
In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to compute the network parameter updates.
It is an efficient application of the chain rule to neural networks. Backpropagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through dynamic programming.[1][2][3]
Strictly speaking, the term backpropagation refers only to an algorithm for efficiently computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent, or as an intermediate step in a more complicated optimizer, such as Adam.[4]
Backpropagation had multiple discoveries and partial discoveries, with a tangled history and terminology. See the history section for details. Some other names for the technique include "reverse mode of automatic differentiation" or "reverse accumulation".[5]