tf.data.experimental.bucket_by_sequence_length
    
    
      
    
    
      
      Stay organized with collections
    
    
      
      Save and categorize content based on your preferences.
    
  
  
      
    
  
  
  
  
  
    
  
  
    
    
A transformation that buckets elements in a Dataset by length.
tf.data.experimental.bucket_by_sequence_length(
    element_length_func, bucket_boundaries, bucket_batch_sizes, padded_shapes=None,
    padding_values=None, pad_to_bucket_boundary=False, no_padding=False,
    drop_remainder=False
)
Elements of the Dataset are grouped together by length and then are padded
and batched.
This is useful for sequence tasks in which the elements have variable length.
Grouping together elements that have similar lengths reduces the total
fraction of padding in a batch which increases training step efficiency.
| Args | 
|---|
| element_length_func | function from element in Datasettotf.int32,
determines the length of the element, which will determine the bucket it
goes into. | 
| bucket_boundaries | list<int>, upper length boundaries of the buckets. | 
| bucket_batch_sizes | list<int>, batch size per bucket. Length should belen(bucket_boundaries) + 1. | 
| padded_shapes | Nested structure of tf.TensorShapeto pass totf.data.Dataset.padded_batch. If not provided, will usedataset.output_shapes, which will result in variable length dimensions
being padded out to the maximum length in each batch. | 
| padding_values | Values to pad with, passed to tf.data.Dataset.padded_batch. Defaults to padding with 0. | 
| pad_to_bucket_boundary | bool, if False, will pad dimensions with unknown
size to maximum length in batch. IfTrue, will pad dimensions with
unknown size to bucket boundary minus 1 (i.e., the maximum length in each
bucket), and caller must ensure that the sourceDatasetdoes not contain
any elements with length longer thanmax(bucket_boundaries). | 
| no_padding | bool, indicates whether to pad the batch features (features
need to be either of typetf.SparseTensoror of same shape). | 
| drop_remainder | (Optional.) A tf.boolscalartf.Tensor, representing
whether the last batch should be dropped in the case it has fewer thanbatch_sizeelements; the default behavior is not to drop the smaller
batch. | 
| Raises | 
|---|
| ValueError | if len(bucket_batch_sizes) != len(bucket_boundaries) + 1. | 
  
  
 
  
    
    
      
       
    
    
  
  
  Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
  Last updated 2020-10-01 UTC.
  
  
  
    
      [null,null,["Last updated 2020-10-01 UTC."],[],[]]