mrpro.algorithms.optimizers.lbfgs
- mrpro.algorithms.optimizers.lbfgs(f: Operator[Unpack, tuple[Tensor, ...]], initial_parameters: Sequence[Tensor], lr: float = 1.0, max_iter: int = 100, max_eval: int | None = 100, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 10, line_search_fn: None | Literal['strong_wolfe'] = 'strong_wolfe', callback: Callable[[OptimizerStatus], None] | None = None) tuple[Tensor, ...] [source]
LBFGS for (non-linear) minimization problems.
The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a quasi-Newton optimization method that approximates the inverse Hessian matrix using a limited memory of past gradients and updates. It is well-suited for high-dimensional problems and leverages curvature information for faster convergence compared to first-order methods such as
mrpro.algorithms.optimizers.adam
.The parameter update rule is:
\[\theta_{k+1} = \theta_k - \alpha_k H_k \nabla f(\theta_k),\]where \(H_k\) is a limited-memory approximation of the inverse Hessian, and \(\alpha_k\) is the step size determined via line search (e.g., strong Wolfe conditions).
The algorithm performs the following steps:
Compute the gradient of the objective function.
Approximate the inverse Hessian matrix \(H_k\) using stored gradients and updates.
Perform a line search to compute the step size \(\alpha_k\).
Update the parameters.
Store the latest gradient and update information.
This implementation wraps PyTorch’s
torch.optim.LBFGS
class. For more information, see [WIKI], [NOC1980], and [LIU1989].References
[NOC1980]Nocedal, J. (1980). “Updating quasi-Newton matrices with limited storage.” Mathematics of Computation, 35(151), 773-782. https://doi.org/10.1090/S0025-5718-1980-0572855-7
[LIU1989]Liu, D. C., & Nocedal, J. (1989). “On the limited memory BFGS method for large scale optimization.” Mathematical Programming, 45(1-3), 503-528. https://doi.org/10.1007/BF01589116
[WIKI]Wikipedia: Limited-memory_BFGS https://en.wikipedia.org/wiki/Limited-memory_BFGS
- Parameters:
f (
Operator
[Unpack
[tuple
[Tensor
,...
]],tuple
[Tensor
,...
]]) – scalar function to be minimizedinitial_parameters (
Sequence
[Tensor
]) –Sequence
of parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.lr (
float
, default:1.0
) – learning rate. This should usually be left as1.0
if a line search is used.max_iter (
int
, default:100
) – maximal number of iterationsmax_eval (
int
|None
, default:100
) – maximal number of evaluations off
per optimization steptolerance_grad (
float
, default:1e-07
) – termination tolerance on first order optimalitytolerance_change (
float
, default:1e-09
) – termination tolerance on function value/parameter changeshistory_size (
int
, default:10
) – update history sizeline_search_fn (
Optional
[Literal
['strong_wolfe'
]], default:'strong_wolfe'
) – line search algorithm, eitherstrong_wolfe
orNone
(meaning constant step size)callback (
Callable
[[OptimizerStatus
],None
] |None
, default:None
) – Function to be called after each iteration. N.B. the callback is not called within the line search of LBFGS. You can use the information from theOptimizerStatus
to display a progress bar.
- Returns:
List of optimized parameters.