Attend the Women in ML Symposium on December 7

Computes `f(*args, **kwargs)` and its gradients wrt to `args`, `kwargs`.

### Used in the notebooks

The function `f` is invoked according to one of the following rules:

1. If `f` is a function of no arguments then it is called as `f()`.

2. If `len(args) == 1`, `len(kwargs) == 0`, `auto_unpack_single_arg == True` and `isinstance(args[0], (list, tuple))` then `args` is presumed to be a packed sequence of args, i.e., the function is called as `f(*args[0])`.

3. Otherwise, the function is called as `f(*args, **kwargs)`.

Regardless of how `f` is called, gradients are computed with respect to `args` and `kwargs`.

#### Examples

``````tfd = tfp.distributions
tfm = tfp.math

# Case 1: argless `f`.
x = tf.constant(2.)
# ==> [log(2.), 0.5]

# Case 2: packed arguments.
tfm.value_and_gradient(lambda x, y: x * tf.math.log(y), [2., 3.])
# ==> [2. * np.log(3.), (np.log(3.), 2. / 3)]

# Case 3: default.
auto_unpack_single_arg=False)
# ==> [(log(1.), log(2.), log(3.)), (1., 0.5, 0.333)]
``````

`f` Python `callable` to be differentiated. If `f` returns a scalar, this scalar will be differentiated. If `f` returns a tensor or list of tensors, the gradient will be the sum of the gradients of each part. If desired the sum can be weighted by `output_gradients` (see below).
`*args` Arguments as in `f(*args, **kwargs)` and basis for gradient.
`output_gradients` A `Tensor` or structure of `Tensor`s the same size as the result `ys = f(*args, **kwargs)` and holding the gradients computed for each `y` in `ys`. This argument is forwarded to the underlying gradient implementation (i.e., either the `grad_ys` argument of `tf.gradients` or the `output_gradients` argument of `tf.GradientTape.gradient`). Default value: `None`.
`use_gradient_tape` Python `bool` indicating that `tf.GradientTape` should be used rather than `tf.gradient` and regardless of `tf.executing_eagerly()`. (It is only possible to use `tf.gradient` when ```not use_gradient_tape and not tf.executing_eagerly()```.) Default value: `False`.
`auto_unpack_single_arg` Python `bool` which when `False` means the single arg case will not be interpreted as a list of arguments. (See case 2.) Default value: `True`.
`has_aux` Whether `f(*args, **kwargs)` actually returns two outputs, the first being `y` and the second being an auxiliary output that does not get gradients computed.
`name` Python `str` name prefixed to ops created by this function. Default value: `None` (i.e., `'value_and_gradient'`).
`**kwargs` Named arguments as in `f(*args, **kwargs)` and basis for gradient.

If `has_aux` is `False`: y: `y = f(*args, **kwargs)`. dydx: Gradients of `y` with respect to each of `args` and `kwargs`.
`otherwise` A tuple `((y, aux), dydx)`, where `y, aux = f(*args, **kwargs)` and `dydx` are the gradients of `y` with respect to each of `args` and `kwargs`.

[]
[]