Perform a quantized matrix multiplication of a
by the matrix b
with bias
View aliases
Compat aliases for migration
See Migration guide for more details.
tf.compat.v1.raw_ops.QuantizedMatMulWithBiasAndReluAndRequantize
tf.raw_ops.QuantizedMatMulWithBiasAndReluAndRequantize(
a,
b,
bias,
min_a,
max_a,
min_b,
max_b,
min_freezed_output,
max_freezed_output,
Toutput=tf.dtypes.quint8
,
transpose_a=False,
transpose_b=False,
input_quant_mode='MIN_FIRST',
name=None
)
add and relu and requantize fusion.
The inputs must be two-dimensional matrices and 1D bias vector. And the inner
dimension of a
(after being transposed if transpose_a
is non-zero) must
match the outer dimension of b
(after being transposed if transposed_b
is
non-zero). Then do broadcast add operation with bias values on the matrix
multiplication result. The bias size must match inner dimension of b
. Then do
relu activation to get non-negative result. Then do requantize operation to get
final uint8 result.
Args:
a: A Tensor
. Must be one of the following types: qint8
, quint8
, qint32
, qint16
, quint16
.
A matrix to be multiplied. Must be a two-dimensional tensor of type quint8
.
b: A Tensor
. Must be one of the following types: qint8
, quint8
, qint32
, qint16
, quint16
.
A matrix to be multiplied and must be a two-dimensional tensor of type qint8
.
bias: A Tensor
. Must be one of the following types: float32
, qint32
.
A 1D bias tensor with size matching with inner dimension of b
(after being
transposed if transposed_b
is non-zero).
min_a: A Tensor
of type float32
.
The float value that the lowest quantized a
value represents.
max_a: A Tensor
of type float32
.
The float value that the highest quantized a
value represents.
min_b: A Tensor
of type float32
.
The float value that the lowest quantized b
value represents.
max_b: A Tensor
of type float32
.
The float value that the highest quantized b
value represents.
min_freezed_output: A Tensor
of type float32
.
The float value that the highest quantized output value after requantize.
max_freezed_output: A Tensor
of type float32
.
Toutput: An optional tf.DType
from: tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16
. Defaults to tf.quint8
.
transpose_a: An optional bool
. Defaults to False
.
If true, a
is transposed before multiplication.
transpose_b: An optional bool
. Defaults to False
.
If true, b
is transposed before multiplication.
input_quant_mode: An optional string
from: "MIN_FIRST", "SCALED"
. Defaults to "MIN_FIRST"
.
Input data quantization mode. Either MIN_FIRST(default) or SCALED.
name: A name for the operation (optional).
Returns:
A tuple of Tensor
objects (out, min_out, max_out).
out: A `Tensor` of type `Toutput`.
min_out: A `Tensor` of type `float32`.
max_out: A `Tensor` of type `float32`.