Backfired

An NBA history podcast that celebrates bad teams, bad luck, and bad decisions.

Connect

Creating a soccer-style table for the NBA


As someone who only watches basketball and the English Premier League, I love getting to combine the two in some way. Even though I'm not the biggest Boston Celtics fan in the world, seeing my favorite footballer, Bukayo Saka, taking in a Celtics game was one of the bright spots in my malaise after Arsenal's late-season loss to Tottenham Hotspur ended their hopes at an improbable top-4 finish.

So when I had the idea to convert the NBA standings into a soccer-style table, I thought it would be a great little project. I'm going to break down the necessary parts of the short R script that I wrote to accomplish this. It involves scraping data from Basketball Reference.

So first off, I wanted to make this script portable. This script will work on any recent NBA season, but if you want to adapt this for an NBA season that happened before the Hornets became the Pelicans and the Bobcats became the Hornets, you'll just have to change what team abbreviations go into the "teams" vector.

Here I will load the necessary packages:

library(dplyr)
library(stringr)
library(rvest)


This is a vector that holds all 30 teams' abbreviations on Basketball Reference:

teams <- c('BOS','TOR','PHI','NYK','BRK',
                    'ATL','CHO','WAS','MIA','ORL',
                    'MIL','CHI','IND','DET','CLE',
                    'MEM','DAL','HOU','SAS','NOP',
                    'LAL','LAC','SAC','GSW','PHO',
                    'MIN','UTA','DEN','OKC','POR')


Now we set the year and initialize the "schedule" data frame. I'm going to do this for the current season, 2022-23. The end result table will be accurate through the games on Friday, December 16th, 2022.

year=2023

schedule <- data.frame()


Now I'll pull all 30 teams' schedules using this for-loop:

for(i in 1:length(teams)){

   url <- paste0('https://www.basketball-reference.com/teams/',teams[i],'/',year,'_games.html')
   
   webpage <- read_html(url)
   
   tbls_ls <- webpage %>%
   html_nodes("table") %>%
   html_table(fill = TRUE)
     
#only pulling a few columns from the first table in each list
    
   sched <- data.frame(tbls_ls[[1]])[c(2,6,7,9:11)]
   
   sched <- sched %>% filter(Date!="Date")
   
   names(sched) <- c("Date","Home_Away","Opponent","Overtime",
                        "Points_For","Points_Against")
                        
   sched$Team <- teams[i]
   
   schedule <- rbind(schedule,sched)
 }


Because we scraped everything from online, it's all text, so we'll convert the dates to dates and the numbers to numbers. Revolutionary.

 schedule$Date <- as.Date(schedule$Date,format = "%a, %b %d, %Y")
 
 schedule$Points_For <- as.numeric(schedule$Points_For)
 
 schedule$Points_Against <- as.numeric(schedule$Points_Against)
 
 schedule$Overtime <- ifelse(is.na(schedule$Overtime)," ",schedule$Overtime)


Now is where the tough decision comes in. Soccer is famous for having ties, basketball is famous for going into as many overtimes as possible to decide a winner and loser in every single game. If we don't incorporate a tie in some way, the "points" on the table will just be 3 for every win and 0 for every loss, and then the table will be no different from the standings.

So, I arbitrarily decided that, in addition to overtime games, a "draw" could also include a result within 3 points. It could be 4 or 5. You could just include games that went into overtime. I just chose overtime games and games within 3 points. Feel free to change the draw threshold around and see how different the table becomes!

 schedule$Result <- ifelse(schedule$Overtime=="OT", 'D',  
 ifelse(abs(schedule$Points_Against-schedule$Points_For)<=3, 'D', 
 ifelse(schedule$Points_For > schedule$Points_Against,'W','L')))
 
 schedule$points <- ifelse(schedule$Result=='D',1,ifelse(schedule$Result=='W',3,0))


Now we will put everything together in a nice clean table in the same column order as the soccer table. (Here's where I switch from base R to dplyr full-time.)

table <- schedule %>% 
    filter(!is.na(points)) %>% 
  group_by(Team) %>% 
  summarize(
      `Matches Played` = sum(!is.na(points)),
        Wins = sum(ifelse(points==3,1,0)),
        Draws = sum(ifelse(points==1,1,0)),
        Losses = sum(ifelse(points==0,1,0)),
       `Scored` = sum(Points_For,na.rm=T),
       `Conceded` = sum(Points_Against,na.rm=T),
         Differential = Scored-Conceded,
         Points=sum(points,na.rm=T)) %>% 
  arrange(desc(Points),desc(Differential)) %>% 
  mutate(Rank=row_number()) %>% 
  print(n=30)

But we also want to show the "Form" column that shows the 5 most recent results for each team. So we'll make that and join it into the final table, first initializing the blank "form" data frame:

 form <- data.frame()
 
 for(i in 1:length(teams)){
       L5 <- schedule %>% 
                 filter(Team==teams[i] & !is.na(points)) %>% 
                 tail(5) %>% 
                 mutate('Last 5' = paste0(Result[5],
                           Result[4],
                           Result[3],
                           Result[2],
                           Result[1]))
       form[i,1] <- teams[i]
       form[i,2] <- L5$`Last 5`[1]
}

names(form) <- c("Team",'Form')


Now we left-join the table with "form" and rearrange a couple of columns to get our full result.

full_table <- table %>% 
                     left_join(form) %>% 
                     select(10,1:9,11)


And so, for the day I'm writing this up, this is the current soccer-style table for the NBA. If the NBA relegated and promoted teams like soccer leagues did, we'd have the Boston Celtics clearly winning the league, the Memphis Grizzlies, Cleveland Cavaliers, and New Orleans Pelicans joining them in the Champions League, and the San Antonio Spurs, Detroit Pistons, and Charlotte Hornets facing relegation.

RankTeamMatches PlayedWinsDrawsLossesScoredConcededDifferentialPointsForm
1Boston Celtics3020553572338318965LDLLW
2Memphis Grizzlies2817473261311714455WWWWW
3Cleveland Cavaliers3016683327314418354WWDWL
4New Orleans Pelicans2816663287312915854DLDWW
5Milwaukee Bucks281738314230479554LWLDW
6Phoenix Suns2915863343321113253WLDLL
7Brooklyn Nets3015510338133334850DWDWW
8Philadelphia 76ers281459312030279347WWWDL
9Sacramento Kings281459329832336547WDLLW
10Denver Nuggets281378323932132646LWWDD
11Los Angeles Clippers311361233223350-2845LWWWL
12Golden State Warriors301351235113505644LLLWD
13New York Knicks2912611331532605542WDWWW
14Toronto Raptors2912611323332141942DDLLW
15Indiana Pacers301261234473474-2742LWLDW
16Utah Jazz3111812365435896541DWLLD
17Portland Trail Blazers291171132623256640LWWWD
18Atlanta Hawks301241434173451-3440WLLDL
19Minnesota Timberwolves291241333133360-4740DLLLW
20Los Angeles Lakers281151232303255-2538WDWDL
21Chicago Bulls281151231573187-3038LDDWW
22Dallas Mavericks298138324131885337WLWLD
23Washington Wizards29961432203303-8333LLLLL
24Miami Heat307111232413277-3632DDWLW
25Oklahoma City Thunder29861533443400-5630DDLLL
26Houston Rockets28831730863230-14427DWWLW
27Orlando Magic30751832813406-12526WWWWD
28San Antonio Spurs28731830853370-28524LDWWL
29Detroit Pistons31542234413649-20819LDLLL
30Charlotte Hornets29471831943403-20919LDLLL

Note: I cut out parts of my script that automatically push the results to Google Sheets, but to make this table in a clean HTML format, I copied-and-pasted the spreadsheet result into this tool.

If you're curious, here are the top 4 teams of the past few seasons:

2021-22:

Phoenix Suns, 180 points

Memphis Grizzlies, 160 points

Miami Heat, 160 points,

Golden State Warriors, 153 points

2020-21:

Utah Jazz, 155 points

Brooklyn Nets, 140 points

Los Angeles Clippers, 139 points

Philadelphia 76ers, 135 points

2019-20:

Milwaukee Bucks, 166 points

Toronto Raptors, 151 points

Los Angeles Lakers, 141 points

Boston Celtics, 138 points