Theses Doctoral

Essays in Basketball Analytics

Keshri, Suraj Kumar

With the increasing popularity and competition in professional basketball in the past decade, data driven decision has emerged as a big competitive edge. The advent of high frequency player tracking data from SportVU has enabled a rigorous analysis of player abilities and interactions that was not possible before. The tracking data records two-dimensional x-y coordinates of 10 players on the court as well as the x-y-z coordinates of the ball at a resolution of 25 frames per second, yielding over 1 billion space-time observations over the course of a full season. This dissertation offers a collection of spatio-temporal models and player evaluation metrics that provide insight into the player interactions and their performance, hence allowing the teams to make better decisions.
Conventional approaches to simulate matches have ignored that in basketball the dynamics of ball movement is very sensitive to the lineups on the court and unique identities of players on both offense and defense sides. In chapter 2, we propose the simulation infrastructure that can bridge the gap between player identity and team level network. We model the progression of a basketball match using a probabilistic graphical model. We model every touch event in a game as a sequence of transitions between discrete states. We treat the progression of a match as a graph, where each node represents the network structure of players on the court, their actions, events, etc., and edges denote possible moves in the game flow. Our results show that either changes in the team lineup or changes in the opponent team lineup significantly affects the dynamics of a match progression. Evaluation on the match data for the 2013-16 NBA season suggests that the graphical model approach is appropriate for modeling a basketball match.
NBA teams value players who can ``stretch'' the floor, i.e. create space on the court by drawing their defender(s) closer to themselves. Clearly, this ability to attract defenders varies across players, and furthermore, this effect may also vary by the court location of the offensive player, and whether or not the player is the ball handler. For instance, a ball-handler near the basket attracts a defender more when compared to a non ball-handler at the 3 point line. This has a significant effect on the defensive assignment. This is particularly important because defensive assignment has become the cornerstone of all tracking data based player evaluation models. In chapter 3, we propose a new model to learn player and court location specific offensive attraction. We show that offensive players indeed have varying ability to attract the defender in different parts of the court. Using this metric, teams can evaluate players to construct a roster or lineup which maximizes spacing. We also improve upon the existing defensive matchup inference algorithm for SportVU data.
While the ultimate goal of the offense is to shoot the ball, the strategy lies in creating good shot opportunities. Offensive play event detection has been a topic of research interest. Current research in this area have used a supervised learning approach to detect and classify such events. We took an unsupervised learning approach to detect these events. This has two inherent benefits: first, there is no need for pretagged data to learn identifying these events which is a lobor intensive and error prone task; second, an unsupervised approach allows us to detect events that has not been tagged yet i.e. novel events. We use a HMM based approach to detect these events at any point in the time during a possession by specifying the functional form of the prior distribution on the player movement data. We test our framework on detecting ball screen, post up, and drive. However, it can be easily extended to events like isolation or a new event that has certain distinct defensive matchup or player movement feature compared to a non event. This is the topic for chapter 4.
Accurate estimation of the offensive and the defensive abilities of players in the NBA plays a crucial role in player selection and ranking. A typical approach to estimate players' defensive and offensive abilities is to learn the defensive assignment for each shot and then use a random effects model to estimate the offensive and defensive abilities for each player. The scalar estimate from the random effects model can then be used to rank player. In this approach, a shot has a binary outcome, either it is made or it is a miss. This approach is not able to take advantage of the “quality” of the shot trajectory. In chapter 5, we propose a new method for ranking players that infers the quality of a shot trajectory using a deep recurrent neural network, and then uses this quality measure in a random effects model to rank players taking defensive matchup into account. We show that the quality information significantly improves the player ranking. We also show that including the quality of shots increases the separation between the learned random effect coefficients, and thus, allows for a better differentiation of player abilities. Further, we show that we are able to infer changes in the player's ability on a game-by-game basis when using a trajectory based model. A shot based model does not have enough information to detect changes in player's ability on a game-by-game basis.
A good defensive player prevents its opponent from making a shot, attempting a good shot, making an easy pass, or scoring events, eventually leading to wasted shot clock time. The salient feature here is that a good defender prevents events. Consequently, event driven metrics, such as box scores, cannot measure defensive abilities. Conventional wisdom in basketball is that ``pesky'' defenders continuously maintain a close distance to the ball handler. A closely guarded offensive player is less likely to take or make a shot, less likely to pass, and more likely to lose the ball. In chapter 6, we introduce Defensive Efficiency Rating (DER), a new statistic that measures the defensive effectiveness of a player. DER is the effective distance a defender maintains with the ball handler during an interaction where we control for the identity and wingspan of the the defender, the shot efficiency of the ball handler, and the zone on the court. DER allows us to quantify the quality of defensive interaction without being limited by the occurrence of discrete and infrequent events like shots and rebounds. We show that the ranking from this statistic naturally picks out defenders known to perform well in particular zones.


  • thumnail for Keshri_columbia_0054D_15460.pdf Keshri_columbia_0054D_15460.pdf application/pdf 2.04 MB Download File

More About This Work

Academic Units
Industrial Engineering and Operations Research
Thesis Advisors
Iyengar, Garud N.
Ph.D., Columbia University
Published Here
September 24, 2019