tfm.nlp.layers.BertPackInputs

Packs tokens into model inputs for BERT.

seq_length The desired output length. Must not exceed the max_seq_length that was fixed at training time for the BERT model receiving the inputs.
start_of_sequence_id The numeric id of the token that is to be placed at the start of each sequence (called "[CLS]" for BERT).
end_of_segment_id The numeric id of the token that is to be placed at the end of each input segment (called "[SEP]" for BERT).
padding_id The numeric id of the token that is to be placed into the unused positions after the last segment in the sequence (called "[PAD]" for BERT).
special_tokens_dict Optionally, a dict from Python strings to Python integers that contains values for start_of_sequence_id, end_of_segment_id and padding_id. (Further values in the dict are silenty ignored.) If this is passed, separate _id arguments must be omitted.
truncator The algorithm to truncate a list of batched segments to fit a per-example length limit. The value can be either round_robin or waterfall: (1) For "round_robin" algorithm, available space is assigned one token at a time in a round-robin fashion to the inputs that still need some, until the limit is reached. It currently only supports one or two segments. (2) For "waterfall" algorithm, the allocation of the budget is done using a "waterfall" algorithm that allocates quota in a left-to-right manner and fills up the buckets until we run out of budget. It support arbitrary number of segments.
**kwargs<a id="*kwargs"> standard arguments to Layer().

ImportError if importing tensorflow_text failed.

Methods

bert_pack_inputs

View source

Freestanding equivalent of the BertPackInputs layer.

call

View source

Adds special tokens to pack a list of segments into BERT input Tensors.

Args
inputs A Python list of one or two RaggedTensors, each with the batched values one input segment. The j-th segment of the i-th input example consists of slice inputs[j][i, ...].

Returns
A nest of Tensors for use as input to the BERT TransformerEncoder.