TensorFlow text processing guide

The TensorFlow text processing guide documents libraries and workflows for natural language processing (NLP) and introduces important concepts for working with text.

KerasNLP

KerasNLP is a high-level natural language processing (NLP) library that includes all the latest Transformer-based models as well as lower-level tokenization utilities. It's the recommended solution for most NLP use cases.

  • Getting Started with KerasNLP: Learn KerasNLP by performing sentiment analysis at progressive levels of complexity, from using a pre-trained model to building your own Transformer from scratch.

tf.strings

The tf.strings module provides operations for working with string Tensors.

  • Unicode strings: Represent Unicode strings in TensorFlow and manipulate them using Unicode equivalents of standard string ops.

TensorFlow Text

If you need access to lower-level text processing tools, you can use TensorFlow Text. TensorFlow Text provides a collection of ops and libraries to help you work with input in text form such as raw text strings or documents.

Pre-processing

  • BERT Preprocessing with TF Text: Use TensorFlow Text preprocessing ops to transform text data into inputs for BERT.
  • Tokenizing with TF Text: Understand the tokenization options provided by TensorFlow Text. Learn when you might want to use one option over another, and how these tokenizers are called from within your model.
  • Subword tokenizers: Generate a subword vocabulary from a dataset, and use it to build a text.BertTokenizer from the vocabulary.

TensorFlow models – NLP

The TensorFlow Models - NLP library provides Keras primitives that can be assembled into Transformer-based models, and scaffold classes that enable easy experimentation with novel architectures.