Filling the Gaps: A Multiple Imputation Approach to Estimating Aging Curves in Baseball

aging curves
baseball
multiple imputation
survival bias
Studying the baseball aging curves in a missing data context and account for different types of dropouts of baseball players during their careers
Authors
Affiliations

Quang Nguyen

Department of Statistics & Data Science, Carnegie Mellon University

Gregory J. Matthews

Department of Mathematics and Statistics, Loyola University Chicago

Published

January 29, 2023

JSA arxiv code

@article{nguyen2024filling,
  title={Filling the Gaps: A Multiple Imputation Approach to Estimating Aging Curves in Baseball},
  author={Nguyen, Quang and Matthews, Gregory J.},
  journal={Journal of Sports Analytics},
  volume={10},
  number={1},
  pages={77--85},
  year={2024},
  publisher={IOS Press}
}

Abstract

In sports, an aging curve depicts the relationship between average performance and age in athletes’ careers. This paper investigates the aging curves for offensive players in Major League Baseball. We study this problem in a missing data context and account for different types of dropouts of baseball players during their careers. We employ a multiple imputation framework for multilevel data to impute the player performance associated with the missing seasons, and estimate the aging curves based on the imputed datasets. We then evaluate the effects of different dropout mechanisms on the aging curves through simulation, before applying our method to analyze MLB player data from past seasons. Results suggest an overestimation of the aging curves constructed without considering the unobserved seasons, whereas estimates obtained from multiple imputation address this shortcoming.