Join TensorFlow at Google I/O, May 11-12 Register now

crema_d

  • Description:

CREMA-D is an audio-visual data set for emotion recognition. The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were collected. This release contains only the audio stream from the original audio-visual recording. The samples are splitted between train, validation and testing so that samples from each speaker belongs to exactly one split.

Split Examples
'test' 1,556
'train' 5,144
'validation' 738
  • Feature structure:
FeaturesDict({
    'audio': Audio(shape=(None,), dtype=tf.int64),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=6),
    'speaker_id': tf.string,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
audio Audio (None,) tf.int64
label ClassLabel tf.int64
speaker_id Tensor tf.string
  • Citation:
@article{cao2014crema,
  title={ {CREMA-D}: Crowd-sourced emotional multimodal actors dataset},
  author={Cao, Houwei and Cooper, David G and Keutmann, Michael K and Gur, Ruben C and Nenkova, Ani and Verma, Ragini},
  journal={IEEE transactions on affective computing},
  volume={5},
  number={4},
  pages={377--390},
  year={2014},
  publisher={IEEE}
}