mrpro.algorithms.optimizers.lbfgs

mrpro.algorithms.optimizers.lbfgs(f: Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor, ...]], initial_parameters: Sequence[Tensor], learning_rate: float = 1.0, max_iterations: int = 100, max_evaluations: int | None = 100, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 10, line_search_fn: None | Literal['strong_wolfe'] = 'strong_wolfe', callback: Callable[[OptimizerStatus], bool | None] | None = None) tuple[Tensor, ...][source]

LBFGS for (non-linear) minimization problems.

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a quasi-Newton optimization method that approximates the inverse Hessian matrix using a limited memory of past gradients and updates. It is well-suited for high-dimensional problems and leverages curvature information for faster convergence compared to first-order methods such as mrpro.algorithms.optimizers.adam.

The parameter update rule is:

\[\theta_{k+1} = \theta_k - \alpha_k H_k \nabla f(\theta_k),\]

where \(H_k\) is a limited-memory approximation of the inverse Hessian, and \(\alpha_k\) is the step size determined via line search (e.g., strong Wolfe conditions).

The algorithm performs the following steps:

  1. Compute the gradient of the objective function.

  2. Approximate the inverse Hessian matrix \(H_k\) using stored gradients and updates.

  3. Perform a line search to compute the step size \(\alpha_k\).

  4. Update the parameters.

  5. Store the latest gradient and update information.

This implementation wraps PyTorch’s torch.optim.LBFGS class. For more information, see [WIKI], [NOC1980], and [LIU1989].

References

[NOC1980]

Nocedal, J. (1980). “Updating quasi-Newton matrices with limited storage.” Mathematics of Computation, 35(151), 773-782. https://doi.org/10.1090/S0025-5718-1980-0572855-7

[LIU1989]

Liu, D. C., & Nocedal, J. (1989). “On the limited memory BFGS method for large scale optimization.” Mathematical Programming, 45(1-3), 503-528. https://doi.org/10.1007/BF01589116

[WIKI]

Wikipedia: Limited-memory_BFGS https://en.wikipedia.org/wiki/Limited-memory_BFGS

Parameters:
  • f (Operator[Unpack[tuple[Tensor, ...]], tuple[Tensor, ...]]) – scalar function to be minimized

  • initial_parameters (Sequence[Tensor]) – Sequence of parameters to be optimized. Note that these parameters will not be changed. Instead, we create a copy and leave the initial values untouched.

  • learning_rate (float, default: 1.0) – learning rate. This should usually be left as 1.0 if a line search is used.

  • max_iterations (int, default: 100) – maximal number of iterations

  • max_evaluations (int | None, default: 100) – maximal number of evaluations of f per optimization step

  • tolerance_grad (float, default: 1e-07) – termination tolerance on first order optimality

  • tolerance_change (float, default: 1e-09) – termination tolerance on function value/parameter changes

  • history_size (int, default: 10) – update history size

  • line_search_fn (Optional[Literal['strong_wolfe']], default: 'strong_wolfe') – line search algorithm, either strong_wolfe or None (meaning constant step size)

  • callback (Callable[[OptimizerStatus], bool | None] | None, default: None) – Function to be called after each iteration. This can be used to monitor the progress of the algorithm. If it returns False, the algorithm stops at that iteration. N.B. the callback is not called within the line search of LBFGS. You can use the information from the OptimizerStatus to display a progress bar.

Returns:

List of optimized parameters.