Extracting NFL tracking data from images to evaluate quarterbacks and pass defenses

football
player-tracking data
completion probability
image extraction
generalized additive models
How Sarah Mallepalle created next-gen-scraPy and became an NFL inspiration
Authors
Affiliations

Sarah Mallepalle

Department of Statistics & Data Science, Carnegie Mellon University

Ronald Yurko

Department of Statistics & Data Science, Carnegie Mellon University

Konstantinos Pelechrinis

School of Computing and Information, University of Pittsburgh

Samuel L. Ventura

Department of Statistics & Data Science, Carnegie Mellon University

Published

April 28, 2020

JQAS arxiv

@article{mallepalle2020extracting,
  title={Extracting NFL tracking data from images to evaluate quarterbacks and pass defenses},
  author={Mallepalle, Sarah and Yurko, Ronald and Pelechrinis, Konstantinos and Ventura, Samuel L},
  journal={Journal of Quantitative Analysis in Sports},
  volume={16},
  number={2},
  pages={95--120},
  year={2020},
  publisher={De Gruyter}
}

Abstract

The NFL collects detailed tracking data capturing the location of all players and the ball during each play. Although the raw form of this data is not publicly available, the NFL releases a set of aggregated statistics via their Next Gen Stats (NGS) platform. They also provide charts showing the locations of pass attempts and outcomes for individual quarterbacks. Our work aims to partially close the gap between what data is available privately (to NFL teams) and publicly, and our contribution is two-fold. First, we introduce an image processing tool designed specifically for extracting the raw data from the NGS pass charts. We extract the pass outcome, coordinates, and other metadata. Second, we analyze the resulting dataset, examining the spatial tendencies and performances of individual quarterbacks and defenses. We use a generalized additive model for completion percentages by field location. We introduce a naive Bayes approach for estimating the 2-D completion percentage surfaces of individual teams and quarterbacks, and we provide a one-number summary, completion percentage above expectation (CPAE), for evaluating quarterbacks and team defenses. We find that our pass location data closely matches the NFL’s tracking data, and that our CPAE metric closely matches the NFL’s proprietary CPAE metric.