tfds.deprecated.text.Tokenizer
Splits a string into tokens, and joins them back.
tfds.deprecated.text.Tokenizer(
alphanum_only=True, reserved_tokens=None
)
Args |
alphanum_only
|
bool , if True , only parse out alphanumeric tokens
(non-alphanumeric characters are dropped); otherwise, keep all
characters (individual tokens will still be either all alphanumeric or
all non-alphanumeric).
|
reserved_tokens
|
list<str> , a list of strings that, if any are in s ,
will be preserved as whole tokens, even if they contain mixed
alphanumeric/non-alphanumeric characters.
|
Attributes |
alphanum_only
|
|
reserved_tokens
|
|
Methods
join
View source
join(
tokens
)
Joins tokens into a string.
load_from_file
View source
@classmethod
load_from_file(
filename_prefix
)
save_to_file
View source
save_to_file(
filename_prefix
)
tokenize
View source
tokenize(
s
)
Splits a string into tokens.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[]]