![]() |
Transformer model with Keras.
tfm.nlp.models.Seq2SeqTransformer(
vocab_size=33708,
embedding_width=512,
dropout_rate=0.0,
padded_decode=False,
decode_max_length=None,
extra_decode_length=0,
beam_size=4,
alpha=0.6,
encoder_layer=None,
decoder_layer=None,
eos_id=EOS_ID,
**kwargs
)
Implemented as described in: https://arxiv.org/pdf/1706.03762.pdf
The Transformer model consists of an encoder and decoder. The input is an int sequence (or a batch of sequences). The encoder produces a continuous representation, and the decoder uses the encoder output to generate probabilities for the output sequence.
Methods
call
call(
inputs
)
Calculate target logits or inferred target sequences.
Args | |
---|---|
inputs
|
a dictionary of tensors.
Feature inputs (optional): int tensor with shape
[batch_size, input_length] .
Feature embedded_inputs (optional): float tensor with shape
[batch_size, input_length, embedding_width] .
Feature targets (optional): None or int tensor with shape
[batch_size, target_length] .
Feature input_masks (optional): When providing the embedded_inputs ,
the dictionary must provide a boolean mask marking the filled time
steps. The shape of the tensor is [batch_size, input_length] .
Either inputs or embedded_inputs and input_masks must be present
in the input dictionary. In the second case the projection of the
integer tokens to the transformer embedding space is skipped and
input_masks is expected to be present.
|
Returns | |
---|---|
If targets is defined, then return logits for each word in the target
sequence, which is a float tensor with shape
(batch_size, target_length, vocab_size) . If target is None , then
generate output sequence one token at a time and
returns a dictionary {
outputs: (batch_size, decoded_length)
scores: (batch_size, 1) }
Even when float16 is used, the output tensor(s) are always float32 .
|
Raises | |
---|---|
NotImplementedError
|
If try to use padded decode method on CPU/GPUs. |