Attend the Women in ML Symposium on December 7 Register now


  • Description:

The batting averages of 18 Major League Baseball players through their first 45 at-bats of the 1970 season, along with their batting average for the remainder the season.

The data has been modified from the table in the paper, as used for case studies using Stan and PyMC3, by adding columns explicitly listing the number of at-bats early in the season, as well as at-bats and hits for the full season.

Split Examples
'train' 18
  • Feature structure:
    'At-Bats': int32,
    'BattingAverage': float32,
    'FirstName': object,
    'Hits': int32,
    'LastName': object,
    'RemainingAt-Bats': int32,
    'RemainingAverage': float32,
    'SeasonAt-Bats': int32,
    'SeasonAverage': float32,
    'SeasonHits': int32,
  • Feature documentation:
Feature Class Shape Dtype Description
At-Bats Tensor int32
BattingAverage Tensor float32
FirstName Tensor object
Hits Tensor int32
LastName Tensor object
RemainingAt-Bats Tensor int32
RemainingAverage Tensor float32
SeasonAt-Bats Tensor int32
SeasonAverage Tensor float32
SeasonHits Tensor int32
  • Citation:
  title={Data analysis using Stein's estimator and its generalizations},
  author={Efron, Bradley and Morris, Carl},
  journal={Journal of the American Statistical Association},
  publisher={Taylor \& Francis}