mrpro.algorithms.optimizers.adam

mrpro.algorithms.optimizers.adam(f: Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor]], initial_parameters: Sequence[Tensor], max_iter: int, lr: float = 0.001, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, decoupled_weight_decay: bool = False, callback: Callable[[OptimizerStatus], None] | None = None) tuple[Tensor, ...][source]

Adam for non-linear minimization problems.

Parameters:
  • f – scalar-valued function to be optimized

  • initial_parameters – Sequence (for example list) of parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.

  • max_iter – maximum number of iterations

  • lr – learning rate

  • betas – coefficients used for computing running averages of gradient and its square

  • eps – term added to the denominator to improve numerical stability

  • weight_decay – weight decay (L2 penalty if decoupled_weight_decay is False)

  • amsgrad – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond

  • decoupled_weight_decay – whether to use Adam (default) or AdamW (if set to true) [LOS2019]

  • callback – function to be called after each iteration

Return type:

list of optimized parameters

References

[LOS2019]

Loshchilov I, Hutter F (2019) Decoupled Weight Decay Regularization. ICLR https://doi.org/10.48550/arXiv.1711.05101