tf.lookup.TextFileInitializer

Table initializers from a text file.

This initializer assigns one entry in the table for each line in the file.

The key and value type of the table to initialize is given by key_dtype and value_dtype.

The key and value content to get from each line is specified by the key_index and value_index.

  • TextFileIndex.LINE_NUMBER means use the line number starting from zero, expects data type int64.
  • TextFileIndex.WHOLE_LINE means use the whole line content, expects data type string.
  • A value >=0 means use the index (starting at zero) of the split line based on delimiter.

For example if we have a file with the following content:

import tempfile
f = tempfile.NamedTemporaryFile(delete=False)
content='\n'.join(["emerson 10", "lake 20", "palmer 30",])
f.file.write(content.encode('utf-8'))
f.file.close()

The following snippet initializes a table with the first column as keys and second column as values:

  • emerson -> 10
  • lake -> 20
  • palmer -> 30
init= tf.lookup.TextFileInitializer(
   filename=f.name,
   key_dtype=tf.string, key_index=0,
   value_dtype=tf.int64, value_index=1,
   delimiter=" ")
table = tf.lookup.StaticHashTable(init, default_value=-1)
table.lookup(tf.constant(['palmer','lake','tarkus'])).numpy()

Similarly to initialize the whole line as keys and the line number as values.

  • emerson 10 -> 0
  • lake 20 -> 1
  • palmer 30 -> 2
init = tf.lookup.TextFileInitializer(
  filename=f.name,
  key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
  value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticHashTable(init, -1)
table.lookup(tf.constant('palmer 30')).numpy()
2

filename The filename of the text file to be used for initialization. The path must be accessible from wherever the graph is initialized (eg. trainer or eval workers). The filename may be a scalar Tensor.
key_dtype The key data type.
key_index the index that represents information of a line to get the table 'key' values from.
value_dtype The value data type.
value_index the index that represents information of a line to get the table 'value' values from.'
vocab_size The number of elements in the file, if known.
delimiter The delimiter to separate fields in a line.
name A name for the operation (optional).
value_index_offset A number to add to all indices extracted from the file This is useful for cases where a user would like to reserve one or more low index values for control characters. For instance, if you would like to ensure that no vocabulary item is mapped to index 0 (so you can reserve 0 for a masking value), you can set value_index_offset to 1; this will mean that the first vocabulary element is mapped to 1 instead of 0.

ValueError when the filename is empty, or when the table key and value data types do not match the expected data types.

key_dtype The expected table key dtype.
value_dtype The expected table value dtype.

Methods

initialize

View source

Initializes the table from a text file.

Args
table The table to be initialized.

Returns
The operation that initializes the table.

Raises
TypeError when the keys and values data types do not match the table key and value data types.