The RmsPropSolver – this solver uses an adaptive learning rate and according to “Understanding RMSProp – faster neural network learning” by Bushaev, the “central ideal of RMSProp is [to] keep the moving average of the squared gradients for each weight. And then divide the gradients by the square root [of] the mean square.”
The SolverParameter specifies all parameters that are used to configure the solver used. This section describes the RMSPROP specific parameters.
delta: this parameter is used for numerical stability by the solver and defaults to 1e-8.
rms_decay: specifies the decay rate applied when calculating
MeanSquare(t) = rms_decay * MeanSquare(t-1) + (1 - rms_decay) * SquaredGradient(t).
When to Use
According to “A Look at Gradient Descent and RMSProp Optimizers“, by Gandhi, RMSProp is “similar to the gradient descent algorithm” yet “restricts the oscillations in the vertical direction” allowing for an increased learning rate that allows faster learning.