AdaDelta Solver

The AdaDeltaSolver – this adaptive gradient solver is an extension to the AdaGradSolver that, as noted in the blob post “Deep Learning Optimizers” by Mayanglambam, uses a ‘restricted window size’ of weighted exponentially averaged past ‘t’ gradients. Ruder notes in “An overview of gradient descent optimization algorithms,” this window size seeks to “reduce [the AdaGrad] aggressive monotonically decreasing learning rate,” by “restricting the window of accumulated past gradients to some fixed [window] size ‘w’.”

Parameters

The SolverParameter specifies all parameters that are used to configure the solver used. This section describes the ADADELTA specific parameters.

type: the SolverType.ADADELTA type creates a new AdaDeltaSolver.

momentum: this parameter is not used by this solver and should be set to 0.

delta: this parameter is used for numerical stability by the solver and defaults to 1e-8.

When To Use

Like the AdaGrad, the AdaDelta may be helpful in finding predictive but not often seen parameters. However, as described in “Optimization techniques comparison in Julia: SGD, Momentum, Adagrad, AdaDelta, Adam” by Int8, the AdaDelta algorithm was inspired to “improve [the] AdaGrad weakness of learning rate converging with zero with increase of time.”

How Can We Help?

AdaDelta Solver

Parameters

When To Use