mrpro.algorithms.optimizers.adam
- mrpro.algorithms.optimizers.adam(f: Operator[Unpack, tuple[Tensor, ...]], initial_parameters: Sequence[Tensor], max_iter: int, lr: float = 0.001, betas: tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, decoupled_weight_decay: bool = False, callback: Callable[[OptimizerStatus], None] | None = None) tuple[Tensor, ...] [source]
Adam for non-linear minimization problems.
- Parameters:
f – scalar-valued function to be optimized
initial_parameters – Sequence (for example list) of parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.
max_iter – maximum number of iterations
lr – learning rate
betas – coefficients used for computing running averages of gradient and its square
eps – term added to the denominator to improve numerical stability
weight_decay – weight decay (L2 penalty if decoupled_weight_decay is False)
amsgrad – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond
decoupled_weight_decay – whether to use Adam (default) or AdamW (if set to true) [LOS2019]
callback – function to be called after each iteration
- Return type:
list of optimized parameters
References
[LOS2019]Loshchilov I, Hutter F (2019) Decoupled Weight Decay Regularization. ICLR https://doi.org/10.48550/arXiv.1711.05101