• Description:

From the paper: We collected a 5003 image dataset automatically from popular Hollywood movies. The images were obtained by running a state-of-the-art person detector on every tenth frame of 30 movies. People detected with high confidence (roughly 20K candidates) were then sent to the crowdsourcing marketplace Amazon Mechanical Turk to obtain groundtruthlabeling. Each image was annotated by five Turkers for $0.01 each to label 10 upperbody joints. The median-of-five labeling was taken in each image to be robust to outlier annotation. Finally, images were rejected manually by us if the person was occluded or severely non-frontal. We set aside 20% (1016 images) of the data for testing.

Split Examples
'test' 1,016
'train' 3,987
  • Feature structure:
    'currframe': float64,
    'image': Image(shape=(480, 720, 3), dtype=uint8),
    'moviename': Text(shape=(), dtype=string),
    'poselet_hit_idx': Sequence(uint16),
    'torsobox': BBoxFeature(shape=(4,), dtype=float32),
    'xcoords': Sequence(float64),
    'ycoords': Sequence(float64),
  • Feature documentation:
Feature Class Shape Dtype Description
currframe Tensor float64
image Image (480, 720, 3) uint8
moviename Text string
poselet_hit_idx Sequence(Tensor) (None,) uint16
torsobox BBoxFeature (4,) float32
xcoords Sequence(Tensor) (None,) float64
ycoords Sequence(Tensor) (None,) float64
    title={MODEC: Multimodal Decomposable Models for Human Pose Estimation},
    author={Sapp, Benjamin and Taskar, Ben},
    booktitle={In Proc. CVPR},

flic/small (default config)

  • Config description: Uses 5003 examples used in CVPR13 MODEC paper.

  • Download size: 286.35 MiB

  • Figure (tfds.show_examples):



  • Config description: Uses 20928 examples, a superset of FLIC consisting of more difficult examples.

  • Download size: 1.10 GiB

  • Figure (tfds.show_examples):