mrpro.algorithms.optimizers.lbfgs

mrpro.algorithms.optimizers.lbfgs(f: Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor, ...]], initial_parameters: Sequence[Tensor], learning_rate: float = 1.0, max_iterations: int = 100, max_evaluations: int | None = 100, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 10, line_search_fn: None | Literal['strong_wolfe'] = 'strong_wolfe', callback: Callable[[OptimizerStatus], bool | None] | None = None) → tuple[Tensor, ...][source]

LBFGS for (non-linear) minimization problems.

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a quasi-Newton optimization method that approximates the inverse Hessian matrix using a limited memory of past gradients and updates. It is well-suited for high-dimensional problems and leverages curvature information for faster convergence compared to first-order methods such as mrpro.algorithms.optimizers.adam.

The parameter update rule is:

\[\theta_{k+1} = \theta_k - \alpha_k H_k \nabla f(\theta_k),\]

where \(H_k\) is a limited-memory approximation of the inverse Hessian, and \(\alpha_k\) is the step size determined via line search (e.g., strong Wolfe conditions).

The algorithm performs the following steps:

Compute the gradient of the objective function.
Approximate the inverse Hessian matrix \(H_k\) using stored gradients and updates.
Perform a line search to compute the step size \(\alpha_k\).
Update the parameters.
Store the latest gradient and update information.

This implementation wraps PyTorch’s torch.optim.LBFGS class. For more information, see [WIKI], [NOC1980], and [LIU1989].

References

[NOC1980]

Nocedal, J. (1980). “Updating quasi-Newton matrices with limited storage.” Mathematics of Computation, 35(151), 773-782. https://doi.org/10.1090/S0025-5718-1980-0572855-7

[LIU1989]

Liu, D. C., & Nocedal, J. (1989). “On the limited memory BFGS method for large scale optimization.” Mathematical Programming, 45(1-3), 503-528. https://doi.org/10.1007/BF01589116

[WIKI]

Wikipedia: Limited-memory_BFGS https://en.wikipedia.org/wiki/Limited-memory_BFGS

Parameters:

f (Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor, ...]]) – scalar function to be minimized
initial_parameters (Sequence[Tensor]) – Sequence of parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.
learning_rate (float, default: 1.0) – learning rate. This should usually be left as 1.0 if a line search is used.
max_iterations (int, default: 100) – maximal number of iterations
max_evaluations (int | None, default: 100) – maximal number of evaluations of f per optimization step
tolerance_grad (float, default: 1e-07) – termination tolerance on first order optimality
tolerance_change (float, default: 1e-09) – termination tolerance on function value/parameter changes
history_size (int, default: 10) – update history size
line_search_fn (Optional[Literal['strong_wolfe']], default: 'strong_wolfe') – line search algorithm, either strong_wolfe or None (meaning constant step size)
callback (Callable[[OptimizerStatus], bool | None] | None, default: None) – Function to be called after each iteration. This can be used to monitor the progress of the algorithm. If it returns False, the algorithm stops at that iteration. N.B. the callback is not called within the line search of LBFGS. You can use the information from the OptimizerStatus to display a progress bar.

Returns:

List of optimized parameters.