tflite_model_maker.text_classifier.DataLoader

DataLoader for text classifier.

Used in the notebooks

Used in the tutorials

dataset A tf.data.Dataset object that contains a potentially large set of elements, where each element is a pair of (input_data, target). The input_data means the raw input data, like an image, a text etc., while the target means some ground truth of the raw input data, such as the classification label of the image etc.
size The size of the dataset. tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite.

num_classes

size Returns the size of the dataset.

Note that this function may return None becuase the exact size of the dataset isn't a necessary parameter to create an instance of this class, and tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite. In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and this function can return an int representing the size of the dataset.

Methods

from_csv

View source

Loads text with labels from the csv file and preproecess text according to model_spec.

Args
filename Name of the file.
text_column String, Column name for input text.
label_column String, Column name for labels.
fieldnames A sequence, used in csv.DictReader. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.
model_spec Specification for the model.
is_training Whether the loaded data is for training or not.
delimiter Character used to separate fields.
quotechar Character used to quote fields containing special characters.
shuffle boolean, if shuffle, random shuffle data.
cache_dir The cache directory to save preprocessed data. If None, generates a temporary directory to cache preprocessed data.

Returns
TextDataset containing text, labels and other related info.

from_folder

View source

Loads text with labels and preproecess text according to model_spec.

Assume the text data of the same label are in the same subdirectory. each file is one text.

Args
filename Name of the file.
model_spec Specification for the model.
is_training Whether the loaded data is for training or not.
class_labels Class labels that should be considered. Name of the subdirectory not in class_labels will be ignored. If None, all the subdirectories will be considered.
shuffle boolean, if shuffle, random shuffle data.
cache_dir The cache directory to save preprocessed data. If None, generates a temporary directory to cache preprocessed data.

Returns
TextDataset containing text, labels and other related info.

gen_dataset

View source

Generate a shared and batched tf.data.Dataset for training/evaluation.

Args
batch_size A integer, the returned dataset will be batched by this size.
is_training A boolean, when True, the returned dataset will be optionally shuffled and repeated as an endless dataset.
shuffle A boolean, when True, the returned dataset will be shuffled to create randomness during model training.
input_pipeline_context A InputContext instance, used to shared dataset among multiple workers when distribution strategy is used.
preprocess A function taking three arguments in order, feature, label and boolean is_training.
drop_remainder boolean, whether the finaly batch drops remainder.

Returns
A TF dataset ready to be consumed by Keras model.

split

View source

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
fraction float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

__len__

View source