View source on GitHub
|
Text utilities.
tfds includes a set of TextEncoders as well as a Tokenizer to enable
expressive, performant, and reproducible natural language research.
Classes
class ByteTextEncoder: Byte-encodes text.
class SubwordTextEncoder: Invertible TextEncoder using word pieces with a byte-level fallback.
class TextEncoder: Abstract base class for converting between text and integers.
class TextEncoderConfig: Configuration for tfds.features.Text.
class TokenTextEncoder: TextEncoder backed by a list of tokens.
class Tokenizer: Splits a string into tokens, and joins them back.
View source on GitHub