mrpro.algorithms.optimizers.lbfgs

mrpro.algorithms.optimizers.lbfgs(f: Operator[Unpack, tuple[Tensor, ...]], initial_parameters: Sequence[Tensor], lr: float = 1.0, max_iter: int = 100, max_eval: int | None = 100, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 10, line_search_fn: None | Literal['strong_wolfe'] = 'strong_wolfe', callback: Callable[[OptimizerStatus], None] | None = None) tuple[Tensor, ...][source]

LBFGS for (non-linear) minimization problems.

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a quasi-Newton optimization method that approximates the inverse Hessian matrix using a limited memory of past gradients and updates. It is well-suited for high-dimensional problems and leverages curvature information for faster convergence compared to first-order methods such as mrpro.algorithms.optimizers.adam.

The parameter update rule is:

\[\theta_{k+1} = \theta_k - \alpha_k H_k \nabla f(\theta_k),\]

where \(H_k\) is a limited-memory approximation of the inverse Hessian, and \(\alpha_k\) is the step size determined via line search (e.g., strong Wolfe conditions).

The algorithm performs the following steps:

  1. Compute the gradient of the objective function.

  2. Approximate the inverse Hessian matrix \(H_k\) using stored gradients and updates.

  3. Perform a line search to compute the step size \(\alpha_k\).

  4. Update the parameters.

  5. Store the latest gradient and update information.

This implementation wraps PyTorch’s torch.optim.LBFGS class. For more information, see [WIKI], [NOC1980], and [LIU1989].

References

[NOC1980]

Nocedal, J. (1980). “Updating quasi-Newton matrices with limited storage.” Mathematics of Computation, 35(151), 773-782. https://doi.org/10.1090/S0025-5718-1980-0572855-7

[LIU1989]

Liu, D. C., & Nocedal, J. (1989). “On the limited memory BFGS method for large scale optimization.” Mathematical Programming, 45(1-3), 503-528. https://doi.org/10.1007/BF01589116

[WIKI]

Wikipedia: Limited-memory_BFGS https://en.wikipedia.org/wiki/Limited-memory_BFGS

Parameters:
  • f (Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor, ...]]) – scalar function to be minimized

  • initial_parameters (Sequence[Tensor]) – Sequence of parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.

  • lr (float, default: 1.0) – learning rate. This should usually be left as 1.0 if a line search is used.

  • max_iter (int, default: 100) – maximal number of iterations

  • max_eval (int | None, default: 100) – maximal number of evaluations of f per optimization step

  • tolerance_grad (float, default: 1e-07) – termination tolerance on first order optimality

  • tolerance_change (float, default: 1e-09) – termination tolerance on function value/parameter changes

  • history_size (int, default: 10) – update history size

  • line_search_fn (Optional[Literal['strong_wolfe']], default: 'strong_wolfe') – line search algorithm, either strong_wolfe or None (meaning constant step size)

  • callback (Callable[[OptimizerStatus], None] | None, default: None) – Function to be called after each iteration. N.B. the callback is not called within the line search of LBFGS. You can use the information from the OptimizerStatus to display a progress bar.

Returns:

List of optimized parameters.