tf.data.experimental.table_from_dataset
Stay organized with collections
Save and categorize content based on your preferences.
Returns a lookup table based on the given dataset.
tf.data.experimental.table_from_dataset(
dataset=None,
num_oov_buckets=0,
vocab_size=None,
default_value=None,
hasher_spec=lookup_ops.FastHashSpec,
key_dtype=tf.dtypes.string
,
name=None
)
This operation constructs a lookup table based on the given dataset of pairs
of (key, value).
Any lookup of an out-of-vocabulary token will return a bucket ID based on its
hash if num_oov_buckets
is greater than zero. Otherwise it is assigned the
default_value
.
The bucket ID range is
[vocabulary size, vocabulary size + num_oov_buckets - 1]
.
Sample Usages:
keys = tf.data.Dataset.range(100)
values = tf.data.Dataset.range(100).map(
lambda x: tf.strings.as_string(x * 2))
ds = tf.data.Dataset.zip((keys, values))
table = tf.data.experimental.table_from_dataset(
ds, default_value='n/a', key_dtype=tf.int64)
table.lookup(tf.constant([0, 1, 2], dtype=tf.int64)).numpy()
array([b'0', b'2', b'4'], dtype=object)
Args |
dataset
|
A dataset containing (key, value) pairs.
|
num_oov_buckets
|
The number of out-of-vocabulary buckets.
|
vocab_size
|
Number of the elements in the vocabulary, if known.
|
default_value
|
The value to use for out-of-vocabulary feature values.
Defaults to -1.
|
hasher_spec
|
A HasherSpec to specify the hash function to use for
assignation of out-of-vocabulary buckets.
|
key_dtype
|
The key data type.
|
name
|
A name for this op (optional).
|
Returns |
The lookup table based on the given dataset.
|
Raises |
ValueError
|
If
dataset does not contain pairs
- The 2nd item in the
dataset pairs has a dtype which is incompatible
with default_value
num_oov_buckets is negative
vocab_size is not greater than zero
- The
key_dtype is not integer or string
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-03-17 UTC.
[null,null,["Last updated 2023-03-17 UTC."],[],[],null,["# tf.data.experimental.table_from_dataset\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.9.3/tensorflow/python/data/experimental/ops/lookup_ops.py#L102-L187) |\n\nReturns a lookup table based on the given dataset.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.data.experimental.table_from_dataset`](https://www.tensorflow.org/api_docs/python/tf/data/experimental/table_from_dataset)\n\n\u003cbr /\u003e\n\n tf.data.experimental.table_from_dataset(\n dataset=None,\n num_oov_buckets=0,\n vocab_size=None,\n default_value=None,\n hasher_spec=lookup_ops.FastHashSpec,\n key_dtype=../../../tf/dtypes#string,\n name=None\n )\n\nThis operation constructs a lookup table based on the given dataset of pairs\nof (key, value).\n\nAny lookup of an out-of-vocabulary token will return a bucket ID based on its\nhash if `num_oov_buckets` is greater than zero. Otherwise it is assigned the\n`default_value`.\nThe bucket ID range is\n`[vocabulary size, vocabulary size + num_oov_buckets - 1]`.\n\n#### Sample Usages:\n\n keys = tf.data.Dataset.range(100)\n values = tf.data.Dataset.range(100).map(\n lambda x: tf.strings.as_string(x * 2))\n ds = tf.data.Dataset.zip((keys, values))\n table = tf.data.experimental.table_from_dataset(\n ds, default_value='n/a', key_dtype=tf.int64)\n table.lookup(tf.constant([0, 1, 2], dtype=tf.int64)).numpy()\n array([b'0', b'2', b'4'], dtype=object)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------|--------------------------------------------------------------------------------------------------|\n| `dataset` | A dataset containing (key, value) pairs. |\n| `num_oov_buckets` | The number of out-of-vocabulary buckets. |\n| `vocab_size` | Number of the elements in the vocabulary, if known. |\n| `default_value` | The value to use for out-of-vocabulary feature values. Defaults to -1. |\n| `hasher_spec` | A `HasherSpec` to specify the hash function to use for assignation of out-of-vocabulary buckets. |\n| `key_dtype` | The `key` data type. |\n| `name` | A name for this op (optional). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| The lookup table based on the given dataset. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `ValueError` | If \u003cbr /\u003e - `dataset` does not contain pairs - The 2nd item in the `dataset` pairs has a dtype which is incompatible with `default_value` - `num_oov_buckets` is negative - `vocab_size` is not greater than zero - The `key_dtype` is not integer or string |"]]