The Hager Zhang line search algorithm.
tfp.substrates.jax.optimizer.linesearch.hager_zhang(
value_and_gradients_function,
initial_step_size=None,
value_at_initial_step=None,
value_at_zero=None,
converged=None,
threshold_use_approximate_wolfe_condition=1e-06,
shrinkage_param=0.66,
expansion_param=5.0,
sufficient_decrease_param=0.1,
curvature_param=0.9,
max_iterations=50,
name=None
)
Performs an inexact line search based on the algorithm of
[Hager and Zhang (2006)][2].
The univariate objective function value_and_gradients_function
is typically
generated by projecting a multivariate objective function along a search
direction. Suppose the multivariate function to be minimized is
g(x1,x2, .. xn)
. Let (d1, d2, ..., dn) be the direction along which we wish
to perform a line search. Then the projected univariate function to be used
for line search is
f(a) = g(x1 + d1 * a, x2 + d2 * a, ..., xn + dn * a)
The directional derivative along (d1, d2, ..., dn) is needed for this
procedure. This also corresponds to the derivative of the projected function
f(a)
with respect to a
. Note that this derivative must be negative for
a = 0
if the direction is a descent direction.
The usual stopping criteria for the line search is the satisfaction of the
(weak) Wolfe conditions. For details of the Wolfe conditions, see
ref. [3]. On a finite precision machine, the exact Wolfe conditions can
be difficult to satisfy when one is very close to the minimum and as argued
by [Hager and Zhang (2005)][1], one can only expect the minimum to be
determined within square root of machine precision. To improve the situation,
they propose to replace the Wolfe conditions with an approximate version
depending on the derivative of the function which is applied only when one
is very close to the minimum. The following algorithm implements this
enhanced scheme.
Usage:
Primary use of line search methods is as an internal component of a class of
optimization algorithms (called line search based methods as opposed to
trust region methods). Hence, the end user will typically not want to access
line search directly. In particular, inexact line search should not be
confused with a univariate minimization method. The stopping criteria of line
search is the satisfaction of Wolfe conditions and not the discovery of the
minimum of the function.
With this caveat in mind, the following example illustrates the standalone
usage of the line search.
# Define value and gradient namedtuple
ValueAndGradient = namedtuple('ValueAndGradient', ['x', 'f', 'df'])
# Define a quadratic target with minimum at 1.3.
def value_and_gradients_function(x):
return ValueAndGradient(x=x, f=(x - 1.3) ** 2, df=2 * (x-1.3))
# Set initial step size.
step_size = tf.constant(0.1)
ls_result = tfp.optimizer.linesearch.hager_zhang(
value_and_gradients_function, initial_step_size=step_size)
# Evaluate the results.
with tf.Session() as session:
results = session.run(ls_result)
# Ensure convergence.
assert results.converged
# If the line search converged, the left and the right ends of the
# bracketing interval are identical.
assert results.left.x == result.right.x
# Print the number of evaluations and the final step size.
print ("Final Step Size: %f, Evaluations: %d" % (results.left.x,
results.func_evals))
References:
[1]: William Hager, Hongchao Zhang. A new conjugate gradient method with
guaranteed descent and an efficient line search. SIAM J. Optim., Vol 16. 1,
pp. 170-172. 2005.
https://www.math.lsu.edu/~hozhang/papers/cg_descent.pdf
[2]: William Hager, Hongchao Zhang. Algorithm 851: CG_DESCENT, a conjugate
gradient method with guaranteed descent. ACM Transactions on Mathematical
Software, Vol 32., 1, pp. 113-137. 2006.
http://users.clas.ufl.edu/hager/papers/CG/cg_compare.pdf
[3]: Jorge Nocedal, Stephen Wright. Numerical Optimization. Springer Series in
Operations Research. pp 33-36. 2006
Args |
value_and_gradients_function
|
A Python callable that accepts a real scalar
tensor and returns a namedtuple with the fields 'x', 'f', and 'df' that
correspond to scalar tensors of real dtype containing the point at which
the function was evaluated, the value of the function, and its
derivative at that point. The other namedtuple fields, if present,
should be tensors or sequences (possibly nested) of tensors.
In usual optimization application, this function would be generated by
projecting the multivariate objective function along some specific
direction. The direction is determined by some other procedure but should
be a descent direction (i.e. the derivative of the projected univariate
function must be negative at 0.).
Alternatively, the function may represent the batching of n such line
functions (e.g. projecting a single multivariate objective function along
n distinct directions at once) accepting n points as input, i.e. a
tensor of shape [n], and the fields 'x', 'f' and 'df' in the returned
namedtuple should each be a tensor of shape [n], with the corresponding
input points, function values, and derivatives at those input points.
|
initial_step_size
|
(Optional) Scalar positive Tensor of real dtype, or
a tensor of shape [n] in batching mode. The initial value (or values) to
try to bracket the minimum. Default is 1. as a float32.
Note that this point need not necessarily bracket the minimum for the line
search to work correctly but the supplied value must be greater than 0.
A good initial value will make the search converge faster.
|
value_at_initial_step
|
(Optional) The full return value of evaluating
value_and_gradients_function at initial_step_size, i.e. a namedtuple with
'x', 'f', 'df', if already known by the caller. If supplied the value of
initial_step_size will be ignored, otherwise the tuple will be computed
by evaluating value_and_gradients_function.
|
value_at_zero
|
(Optional) The full return value of
value_and_gradients_function at 0. , i.e. a namedtuple with
'x', 'f', 'df', if already known by the caller. If not supplied the tuple
will be computed by evaluating value_and_gradients_function.
|
converged
|
(Optional) In batching mode a tensor of shape [n], indicating
batch members which have already converged and no further search should
be performed. These batch members are also reported as converged in the
output, and both their left and right are set to the
value_at_initial_step .
|
threshold_use_approximate_wolfe_condition
|
Scalar positive Tensor
of real dtype. Corresponds to the parameter 'epsilon' in
[Hager and Zhang (2006)][2]. Used to estimate the
threshold at which the line search switches to approximate Wolfe
conditions.
|
shrinkage_param
|
Scalar positive Tensor of real dtype. Must be less than
1. . Corresponds to the parameter gamma in
[Hager and Zhang (2006)][2].
If the secant**2 step does not shrink the bracketing interval by this
proportion, a bisection step is performed to reduce the interval width.
|
expansion_param
|
Scalar positive Tensor of real dtype. Must be greater
than 1. . Used to expand the initial interval in case it does not bracket
a minimum. Corresponds to rho in [Hager and Zhang (2006)][2].
|
sufficient_decrease_param
|
Positive scalar Tensor of real dtype.
Bounded above by the curvature param. Corresponds to delta in the
terminology of [Hager and Zhang (2006)][2].
|
curvature_param
|
Positive scalar Tensor of real dtype. Bounded above
by 1. . Corresponds to 'sigma' in the terminology of
[Hager and Zhang (2006)][2].
|
max_iterations
|
Positive scalar Tensor of integral dtype or None. The
maximum number of iterations to perform in the line search. The number of
iterations used to bracket the minimum are also counted against this
parameter.
|
name
|
(Optional) Python str. The name prefixed to the ops created by this
function. If not supplied, the default name 'hager_zhang' is used.
|
Returns |
results
|
A namedtuple containing the following attributes.
converged: Boolean Tensor of shape [n]. Whether a point satisfying
Wolfe/Approx wolfe was found.
failed: Boolean Tensor of shape [n]. Whether line search failed e.g.
if either the objective function or the gradient are not finite at
an evaluation point.
iterations: int32 Tensor of shape [n]. Number of line search iterations.
func_evals: Scalar int32 Tensor . Number of function evaluations made.
left: A namedtuple, as returned by value_and_gradients_function,
of the left end point of the final bracketing interval. Values are
equal to those of right on batch members where converged is True.
Otherwise, it corresponds to the last interval computed.
right: A namedtuple, as returned by value_and_gradients_function,
of the right end point of the final bracketing interval. Values are
equal to those of left on batch members where converged is True.
Otherwise, it corresponds to the last interval computed.
|