Backfired

An NBA history podcast that celebrates bad teams, bad luck, and bad decisions.

Connect

Calculating NBA roster continuity in R


One of the things that I think Basketball Reference should add on their team pages is something they already do on their Sports Reference college basketball site: adding the % of minutes and/or scoring returned from the previous year's team. Here is a picture of that in action on the 2023 Memphis Tigers roster:

Screen Shot 2022-12-22 at 12.32.54 PM.png

There at the bottom, you can see that after a rash of transfers and three players entering the NBA draft, the Tigers are only returning about a third of their minutes from last year's squad.

But on Basketball Reference, that kind of information is nowhere to be found. There is a hard-to-find list that tells you the percentage of minutes played by people who were on the roster the previous year, but not what percentage of minutes from last year's roster returned. (This may sound like a questionable distinction, but I promise it's different.) Instead, immediately beneath the roster on all team pages is a list of the assistant coaches.

So, I wrote a function in R that calculates for any team in any season:

(1) the number of returning players from the previous season, (2) the number of minutes returned from the previous season, and (3) the percent of minutes returned from the previous season.

The code is below. You'll need the rvest, stringr, and dplyr packages.

returning_minutes_calc = function(team,year){

  #fill in the URLs for the year of interest and the previous year
  url_previous_year <- paste0('https://www.basketball-reference.com/teams/',team,'/',year-1,'.html')
  url_current_year <- paste0('https://www.basketball-reference.com/teams/',team,'/',year,'.html')
    
  
  #get total minutes played by each player from the previous year's team
  previous_year_html_tables <- read_html(url_previous_year) %>%
    html_nodes("table") %>%
    html_table(fill = TRUE)
  
  previous_year_minutes <- data.frame(previous_year_html_tables[[4]]) %>% 
    select(Player=2,MP) %>% 
    filter(Player!='')
  
  
  #do the same for this year's team, but in the second command, create a flag "return" to help determine whether the player did return from last year's squad
  current_year_html_tables <- read_html(url_current_year) %>%
    html_nodes("table") %>%
    html_table(fill = TRUE)
  
  current_year_minutes <- data.frame(current_year_html_tables[[4]]) %>% 
    select(Player=2) %>% 
    mutate(return=1) %>% 
    filter(Player!='')
  
  #join the two tables together and 
  previous_year_minutes %>% 
    left_join(current_year_minutes) %>% 
    mutate(return=ifelse(is.na(return),0,return)) %>%  #(this is where the "return" flag comes in handy)
    mutate(returning_minutes_from_previous_year=MP*return,team=team,year=year) %>% 
    group_by(team,year) %>% 
    summarize(number_of_returning_players=sum(return),
              returning_minutes_from_previous_year=sum(returning_minutes_from_previous_year),
              previous_year_total_minutes=sum(MP)) %>% 
    mutate(percent_minutes_returning=100*(returning_minutes_from_previous_year/previous_year_total_minutes)) %>% return()
  
}

Here are a couple of examples of what the result of this function looks like:

returning_minutes_calc("LAL",2022)
teamyearnumber_of_returning_playersreturning_minutes_from_previous_yearprevious_year_total_minutespercent_minutes_returning
LAL2022339701745622.7
returning_minutes_calc("MEM",2023)
teamyearnumber_of_returning_playersreturning_minutes_from_previous_yearprevious_year_total_minutespercent_minutes_returning
MEM202311156591978279.2

So as we can see, this year's Memphis Grizzlies roster brought back a much larger share of last year's team than the 2022 Los Angeles Lakers brought back from 2021. Everyone on this year's roster, save for the 5 rookies and Danny Green, was on the team last year, while last season the Lakers only brought back LeBron James, Anthony Davis, and Talen Horton-Tucker. That's it. Almost everyone else was signed as a free agent that summer on a short-term contract, and as we know, that had disastrous results for LA last year.


Just for fun, let's check out the relationship between percent of minutes returned and current season winning percentage (through games that took place on December 21st, 2022). We'll just use the Atlantic Division, which is the Brooklyn nets, the Boston Celtics, the New York Knicks, the Philadelphia 76ers, and the Toronto Raptors.

atlantic_teams <- c("BRK", "BOS", "NYK", "PHI", "TOR")
               
atlantic_teams_data <- sapply(atlantic_teams,returning_minutes_calc, year=2023)
atlantic_teams_data_frame <- as.data.frame(t(atlantic_teams_data))
(I am using sapply in lieu of a for-loop to make the code a little cleaner, but I like for-loops better. I've omitted the section of code where I converted this list result into a data frame and set each column to the proper type.)

Then we left join in the standings. I had that in spreadsheet form already, so I'm just importing that into the object standings. This is what it looks like:

>head(standings)
team   wins   losses   win_pct

MIL    22     9        70.96774
BOS    22     10       68.75
CLE    22     11       66.67
BRK    20     12       62.5
PHI    18     12       60
NYK    18     14       56.25

So we join the table atlantic_teams_data_frame to standings and then plot them together. Here's a ggplot2 graph showing the relationship between % of minutes returned from last season and their current win %:

Screen Shot 2022-12-22 at 12.29.57 PM.png

Assuming the correlation between those two is positive when you look at the entire league, we see a massive outlier: the Toronto Raptors, who are returning over 90% of last season's minutes, but currently sit at 14-18 with Pascal Siakam, OG Anunoby, and Fred VanVleet in trade rumors.

Here's the code for the plot, if you're curious:

data_and_standings <- atlantic_teams_data_frame %>% left_join(standings)

library(ggplot2)
ggplot(data_and_standings,aes(x=percent_minutes_returning,y=win_pct, label=team)) + 
  geom_point() + 
  labs(title="Returning minutes and\n winning percentage, 2022-23",
       x ="% of minutes returned from previous season", y = "Team win %") +
  theme(
    plot.title = element_text(hjust=.5, face="bold"),
    axis.title.x = element_text(size=10, face="bold"),
    axis.title.y = element_text(size=10, face="bold")
  ) + 
  labs(caption="Source: basketballreference.com") +
  geom_text(hjust=0, vjust=0)