• Description:

This classic dataset contains physical attributes and prices of 53940 diamonds.


  • price: Price in US dollars.
  • carat: Weight of the diamond.
  • cut: Cut quality (ordered worst to best).
  • color: Color of the diamond (ordered best to worst).
  • clarity: Clarity of the diamond (ordered worst to best).
  • x: Length in mm.
  • y: Width in mm.
  • z: Depth in mm.
  • depth: Total depth percentage: 100 * z / mean(x, y)
  • table: Width of the top of the diamond relative to the widest point.

  • Homepage:

  • Source code:

  • Versions:

    • 1.0.0 (default): Initial release.
  • Download size: 2.64 MiB

  • Dataset size: 13.01 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'train' 53,940
  • Feature structure:
    'features': FeaturesDict({
        'carat': float32,
        'clarity': ClassLabel(shape=(), dtype=int64, num_classes=8),
        'color': ClassLabel(shape=(), dtype=int64, num_classes=7),
        'cut': ClassLabel(shape=(), dtype=int64, num_classes=5),
        'depth': float32,
        'table': float32,
        'x': float32,
        'y': float32,
        'z': float32,
    'price': float32,
  • Feature documentation:
Feature Class Shape Dtype Description
features FeaturesDict
features/carat Tensor float32
features/clarity ClassLabel int64
features/color ClassLabel int64
features/cut ClassLabel int64
features/depth Tensor float32
features/table Tensor float32
features/x Tensor float32
features/y Tensor float32
features/z Tensor float32
price Tensor float32
  • Citation:
  author = {Hadley Wickham},
  title = {ggplot2: Elegant Graphics for Data Analysis},
  publisher = {Springer-Verlag New York},
  year = {2016},
  isbn = {978-3-319-24277-4},
  url = {},