  • Description:

Grounded SCAN (gSCAN) is a synthetic dataset for evaluating compositional generalization in situated language understanding. gSCAN pairs natural language instructions with action sequences, and requires the agent to interpret instructions within the context of a grid-based visual navigation environment.

More information can be found at:

    'command': Sequence(Text(shape=(), dtype=object)),
    'manner': Text(shape=(), dtype=object),
    'meaning': Sequence(Text(shape=(), dtype=object)),
    'referred_target': Text(shape=(), dtype=object),
    'situation': FeaturesDict({
        'agent_direction': int32,
        'agent_position': FeaturesDict({
            'column': int32,
            'row': int32,
        'direction_to_target': Text(shape=(), dtype=object),
        'distance_to_target': int32,
        'grid_size': int32,
        'placed_objects': Sequence({
            'object': FeaturesDict({
                'color': Text(shape=(), dtype=object),
                'shape': Text(shape=(), dtype=object),
                'size': int32,
            'position': FeaturesDict({
                'column': int32,
                'row': int32,
            'vector': Text(shape=(), dtype=object),
        'target_object': FeaturesDict({
            'object': FeaturesDict({
                'color': Text(shape=(), dtype=object),
                'shape': Text(shape=(), dtype=object),
                'size': int32,
            'position': FeaturesDict({
                'column': int32,
                'row': int32,
            'vector': Text(shape=(), dtype=object),
    'target_commands': Sequence(Text(shape=(), dtype=object)),
    'verb_in_command': Text(shape=(), dtype=object),
  • Feature documentation:
Feature Class Shape Dtype Description
command Sequence(Text) (None,) object
manner Text object
meaning Sequence(Text) (None,) object
referred_target Text object
situation FeaturesDict
situation/agent_direction Tensor int32
situation/agent_position FeaturesDict
situation/agent_position/column Tensor int32
situation/agent_position/row Tensor int32
situation/direction_to_target Text object
situation/distance_to_target Tensor int32
situation/grid_size Tensor int32
situation/placed_objects Sequence
situation/placed_objects/object FeaturesDict
situation/placed_objects/object/color Text object
situation/placed_objects/object/shape Text object
situation/placed_objects/object/size Tensor int32
situation/placed_objects/position FeaturesDict
situation/placed_objects/position/column Tensor int32
situation/placed_objects/position/row Tensor int32
situation/placed_objects/vector Text object
situation/target_object FeaturesDict
situation/target_object/object FeaturesDict
situation/target_object/object/color Text object
situation/target_object/object/shape Text object
situation/target_object/object/size Tensor int32
situation/target_object/position FeaturesDict
situation/target_object/position/column Tensor int32
situation/target_object/position/row Tensor int32
situation/target_object/vector Text object
target_commands Sequence(Text) (None,) object
verb_in_command Text object
grounded_scan/compositional_splits (default config)

  • Config description: Examples for compositional generalization.

  • Download size: 82.10 MiB

  • Dataset size: 998.11 MiB

  • Splits:

Split Examples
'adverb_1' 112,880
'adverb_2' 38,582
'contextual' 11,460
'dev' 3,716
'situational_1' 88,642
'situational_2' 16,808
'test' 19,282
'train' 367,933
'visual' 37,436
'visual_easier' 18,718


  • Config description: Examples for generalizing to larger target lengths.

  • Download size: 53.41 MiB

  • Dataset size: 546.73 MiB

  • Splits:

Split Examples
'dev' 1,821
'target_lengths' 198,588
'test' 37,784
'train' 180,301


  • Config description: Examples for spatial relation reasoning.

  • Download size: 89.59 MiB

  • Dataset size: 675.09 MiB

  • Splits:

Split Examples
'dev' 2,617
'referent' 30,492
'relation' 6,285
'relative_position_1' 41,576
'relative_position_2' 41,529
'test' 28,526
'train' 259,088
'visual' 62,250