Attention: TensorFlow Lite is now part of Google AI Edge. The latest documentation is now at ai.google.dev/edge/lite. Learn more

tflite_model_maker.text_classifier.DataLoader

View source on GitHub

DataLoader for text classifier.

tflite_model_maker.text_classifier.DataLoader(
    dataset, size, index_to_label
)

Used in the notebooks

Used in the tutorials
Text classification with TensorFlow Lite Model Maker

Args
`dataset`	A tf.data.Dataset object that contains a potentially large set of elements, where each element is a pair of (input_data, target). The `input_data` means the raw input data, like an image, a text etc., while the `target` means some ground truth of the raw input data, such as the classification label of the image etc.
`size`	The size of the dataset. tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite.

Attributes
`num_classes`
`size`	Returns the size of the dataset. Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Attributes

num_classes

size

Returns the size of the dataset.

Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Methods

`from_csv`

View source

@classmethod
from_csv(
    filename,
    text_column,
    label_column,
    fieldnames=None,
    model_spec='average_word_vec',
    is_training=True,
    delimiter=',',
    quotechar='\'",
    shuffle=False,
    cache_dir=None
)

Loads text with labels from the csv file and preproecess text according to model_spec.

Args
`filename`	Name of the file.
`text_column`	String, Column name for input text.
`label_column`	String, Column name for labels.
`fieldnames`	A sequence, used in csv.DictReader. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.
`model_spec`	Specification for the model.
`is_training`	Whether the loaded data is for training or not.
`delimiter`	Character used to separate fields.
`quotechar`	Character used to quote fields containing special characters.
`shuffle`	boolean, if shuffle, random shuffle data.
`cache_dir`	The cache directory to save preprocessed data. If None, generates a temporary directory to cache preprocessed data.

Returns
TextDataset containing text, labels and other related info.

`from_folder`

View source

@classmethod
from_folder(
    filename,
    model_spec='average_word_vec',
    is_training=True,
    class_labels=None,
    shuffle=True,
    cache_dir=None
)

Loads text with labels and preproecess text according to model_spec.

Assume the text data of the same label are in the same subdirectory. each file is one text.

Args
`filename`	Name of the file.
`model_spec`	Specification for the model.
`is_training`	Whether the loaded data is for training or not.
`class_labels`	Class labels that should be considered. Name of the subdirectory not in `class_labels` will be ignored. If None, all the subdirectories will be considered.
`shuffle`	boolean, if shuffle, random shuffle data.
`cache_dir`	The cache directory to save preprocessed data. If None, generates a temporary directory to cache preprocessed data.

Returns
TextDataset containing text, labels and other related info.

`gen_dataset`

View source

gen_dataset(
    batch_size=1,
    is_training=False,
    shuffle=False,
    input_pipeline_context=None,
    preprocess=None,
    drop_remainder=False
)

Generate a shared and batched tf.data.Dataset for training/evaluation.

Args
`batch_size`	A integer, the returned dataset will be batched by this size.
`is_training`	A boolean, when True, the returned dataset will be optionally shuffled and repeated as an endless dataset.
`shuffle`	A boolean, when True, the returned dataset will be shuffled to create randomness during model training.
`input_pipeline_context`	A InputContext instance, used to shared dataset among multiple workers when distribution strategy is used.
`preprocess`	A function taking three arguments in order, feature, label and boolean is_training.
`drop_remainder`	boolean, whether the finaly batch drops remainder.

Returns
A TF dataset ready to be consumed by Keras model.

`split`

View source

split(
    fraction
)

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
`fraction`	float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

`len`

View source

__len__()

tflite_model_maker.text_classifier.DataLoader

Used in the notebooks

Args

Attributes

Methods

from_csv

from_folder

gen_dataset

split

__len__

`from_csv`

`from_folder`

`gen_dataset`

`split`

`len`