Calculating NBA roster continuity in R
One of the things that I think Basketball Reference should add on their team pages is something they already do on their Sports Reference college basketball site: adding the % of minutes and/or scoring returned from the previous year's team. Here is a picture of that in action on the 2023 Memphis Tigers roster:
There at the bottom, you can see that after a rash of transfers and three players entering the NBA draft, the Tigers are only returning about a third of their minutes from last year's squad.
But on Basketball Reference, that kind of information is nowhere to be found. There is a hard-to-find list that tells you the percentage of minutes played by people who were on the roster the previous year, but not what percentage of minutes from last year's roster returned. (This may sound like a questionable distinction, but I promise it's different.) Instead, immediately beneath the roster on all team pages is a list of the assistant coaches.
So, I wrote a function in R that calculates for any team in any season:
(1) the number of returning players from the previous season, (2) the number of minutes returned from the previous season, and (3) the percent of minutes returned from the previous season.
The code is below. You'll need the rvest
, stringr
, and dplyr
packages.
returning_minutes_calc = function(team,year){
#fill in the URLs for the year of interest and the previous year
url_previous_year <- paste0('https://www.basketball-reference.com/teams/',team,'/',year-1,'.html')
url_current_year <- paste0('https://www.basketball-reference.com/teams/',team,'/',year,'.html')
#get total minutes played by each player from the previous year's team
previous_year_html_tables <- read_html(url_previous_year) %>%
html_nodes("table") %>%
html_table(fill = TRUE)
previous_year_minutes <- data.frame(previous_year_html_tables[[4]]) %>%
select(Player=2,MP) %>%
filter(Player!='')
#do the same for this year's team, but in the second command, create a flag "return" to help determine whether the player did return from last year's squad
current_year_html_tables <- read_html(url_current_year) %>%
html_nodes("table") %>%
html_table(fill = TRUE)
current_year_minutes <- data.frame(current_year_html_tables[[4]]) %>%
select(Player=2) %>%
mutate(return=1) %>%
filter(Player!='')
#join the two tables together and
previous_year_minutes %>%
left_join(current_year_minutes) %>%
mutate(return=ifelse(is.na(return),0,return)) %>% #(this is where the "return" flag comes in handy)
mutate(returning_minutes_from_previous_year=MP*return,team=team,year=year) %>%
group_by(team,year) %>%
summarize(number_of_returning_players=sum(return),
returning_minutes_from_previous_year=sum(returning_minutes_from_previous_year),
previous_year_total_minutes=sum(MP)) %>%
mutate(percent_minutes_returning=100*(returning_minutes_from_previous_year/previous_year_total_minutes)) %>% return()
}
Here are a couple of examples of what the result of this function looks like:
returning_minutes_calc("LAL",2022)
team | year | number_of_returning_players | returning_minutes_from_previous_year | previous_year_total_minutes | percent_minutes_returning |
---|---|---|---|---|---|
LAL | 2022 | 3 | 3970 | 17456 | 22.7 |
returning_minutes_calc("MEM",2023)
team | year | number_of_returning_players | returning_minutes_from_previous_year | previous_year_total_minutes | percent_minutes_returning |
---|---|---|---|---|---|
MEM | 2023 | 11 | 15659 | 19782 | 79.2 |
So as we can see, this year's Memphis Grizzlies roster brought back a much larger share of last year's team than the 2022 Los Angeles Lakers brought back from 2021. Everyone on this year's roster, save for the 5 rookies and Danny Green, was on the team last year, while last season the Lakers only brought back LeBron James, Anthony Davis, and Talen Horton-Tucker. That's it. Almost everyone else was signed as a free agent that summer on a short-term contract, and as we know, that had disastrous results for LA last year.
Just for fun, let's check out the relationship between percent of minutes returned and current season winning percentage (through games that took place on December 21st, 2022). We'll just use the Atlantic Division, which is the Brooklyn nets, the Boston Celtics, the New York Knicks, the Philadelphia 76ers, and the Toronto Raptors.
atlantic_teams <- c("BRK", "BOS", "NYK", "PHI", "TOR")
atlantic_teams_data <- sapply(atlantic_teams,returning_minutes_calc, year=2023)
atlantic_teams_data_frame <- as.data.frame(t(atlantic_teams_data))
(I am using sapply
in lieu of a for-loop to make the code a little cleaner, but I like for-loops better. I've omitted the section of code where I converted this list result into a data frame and set each column to the proper type.)
Then we left join in the standings. I had that in spreadsheet form already, so I'm just importing that into the object standings
. This is what it looks like:
>head(standings)
team wins losses win_pct
MIL 22 9 70.96774
BOS 22 10 68.75
CLE 22 11 66.67
BRK 20 12 62.5
PHI 18 12 60
NYK 18 14 56.25
So we join the table atlantic_teams_data_frame
to standings
and then plot them together. Here's a ggplot2 graph showing the relationship between % of minutes returned from last season and their current win %:
Assuming the correlation between those two is positive when you look at the entire league, we see a massive outlier: the Toronto Raptors, who are returning over 90% of last season's minutes, but currently sit at 14-18 with Pascal Siakam, OG Anunoby, and Fred VanVleet in trade rumors.
Here's the code for the plot, if you're curious:
data_and_standings <- atlantic_teams_data_frame %>% left_join(standings)
library(ggplot2)
ggplot(data_and_standings,aes(x=percent_minutes_returning,y=win_pct, label=team)) +
geom_point() +
labs(title="Returning minutes and\n winning percentage, 2022-23",
x ="% of minutes returned from previous season", y = "Team win %") +
theme(
plot.title = element_text(hjust=.5, face="bold"),
axis.title.x = element_text(size=10, face="bold"),
axis.title.y = element_text(size=10, face="bold")
) +
labs(caption="Source: basketballreference.com") +
geom_text(hjust=0, vjust=0)