finished season  2015 
finished season  2016 
finished season  2017 
finished season  2018 
finished season  2019 
finished season  2020 
finished season  2021 
finished season  2022 
finished season  2023

Background

College basketball has among the largest home court advantages in all of sports.
Student sections have been getting increasingly creative in their attempts to affect the outcome of opposing team shooters making their free throws.

Previous Research

Duke medical student tabulated the miss percentage for games in the 2020-2021 and 2021-2022 seasons for various different distractions the graduate student section makes behind the basket in the second half.

Previous Research

Previous Research

Arizona State is known for its “curtain of distraction” which rose to fame in 2015 and even had a surprise distraction of shirtless olympian Michael Phelps.

Previous Research

The Harvard Sports Analytics Collective did a study in 2015 about the Arizona State “Curtain of Distraction”
Reported that visiting teams shot 60.6% from the free-throw line in the second half compared to 68.6% in the first half.
Estimated it was worth approximately 1.41 points per game.
Not a statistically significant

Previous Research

Harvard PhD student and author of the ncaahoopR package, Luke Benz revisited the Arizona State distraction in 2019 and found that there was not enough evidence to conclude that teams shoot worse against Arizona State than they do at other road games.

Data

Data from the 2015-2016 college basketball season until the current season’s data (2023-2024).
Home teams with less than 15 games in the dataset, grouped into one collective team.
Only look at true road games, not neutral site.

Data

Group all season’s together for the home team when assessing a fan base/court effect
But for the individual team shooting performances I treat each season separately

Methodology

Two methods I explore:

Second Half vs First Half (shooting into a rowdy student section)
Overall effect of a home court on free throw percentage

First Half vs Second

Not every team may put a large student section behind the second half visiting team’s basket
Want to assess if there is an effect for teams
Compare entire county to what we would expect due to randomness.

First Half vs Second

Only include teams with 200 free throws attempted at their home court in both the first half and second half in data set
Results of largest drop-off in free-throw percentage from the 1st half to the 2nd half

First Half vs Second

First Half vs Second

In fact, on average, visiting teams shoot 1 percent better in the second half than compared to the first.
Could be due to the effect of the end of game scenarios in basketball.
- If a team is winning and getting fouled it does its best to get the ball into the hands of their best free throw shooters.
Hard to test this theory
- Would need to go with free throw share by players conditional on time remaining and score.
I adjust p-values and z-scores by taking off an additional 1% of all 2nd half FT%

First Half vs Second

Visiting Free Throw % by Half

2015-16 to 2023-24 Seasons

First Half vs Second

Arizona State: “Curtain of Distraction”
BYU, has a wild student section behind the basket in the second half as well

BYU’s Student Section Attempting to Distract Opposing Shooter

First Half vs Second - Chance?

Significance Test?
Multiple test corrections
- Bonferroni: $\frac{\alpha}{n}$
- $n=355$ teams in our dataset, extremely high standard

First Half vs Second - Chance?

QQ Plot

Full Game Effect Model

College basketball has many teams in Division I with no stands behind the basket or no students.
Can analyze general specific home court effect on visiting team’s true ability in free-throw shooting percentage.

Full Game Effect Model

Samples sizes really are too small to do any player-level analysis
Stick to team-level analysis
A team’s FT % in road game minus average FT % in home games

Full Game Effect Model

What is a better comparison:

Free Throw Percentage at home games?
Free Throw Percentage at other raod games?

Full Game Effect Model

I used home games - More consistent data set (other road games have many different changing variables) - Power conference teams typically have may more home games than true road games.

Full Game Effect Model

This model doesn’t

Account for travel to higher elevations (my previous work at ESPN showed this to matter a lot)
Day’s rest for teams
Distance traveled
Individual player affects on FT% (assume this averages out over a season)

Full Game Effect Model

My model can be defned as

\[\begin{align} y_i &= \beta_0 + \gamma_j + \epsilon\\ \gamma_j &\sim Cauchy(0,1)\\ \epsilon & \sim N(0,\sigma^2) \end{align}\]

$y_i$ is the difference the visiting team’s free throw percentage in game $i$ and its free throw percentage in al their home games that season.
$\gamma_j$ is the effect of the home team $j$’s
Weight the observations by the total number of free throw attempts in the game.

Results

Use posterior draws for the team effects to calculate the estimated effect a team’s home court has on the opposing team’s free throw percentage.
Calculate the posterior probability that a team’s home court has an effect that decreases the probability of an opponent making a free throw.

Results

Home Team's Venue Effect on Visiting Free Throws

Hierarchical Bayesian Model Using 2015-16 to 2023-24 Seasons

Results

We can compare evidence of 2nd half effects to overall modeled affect of home crowd/court.

Results

Order of Magnitude for 2nd Half FTA:

Average FTA per game: 20.2303644.
Say average 2nd half FTA are 10
Impact per game in 2nd half is about: -1for the largest team effect.

Results

Order of Magnitude for whole game FT% Effect:

Average FTA per game: 20.2303644.
Impact per game is about: -2 for the largest team effect.

Conclusion

A handful of teams have clear evidence of a 2nd half effect
A handful of teams have clear evidence of teams shooting worse at FT’s in their stadium
- Confounded with altitude, distanc traveled
Should get more teams to try 2nd half distractions!

--- title: "Can Students Affect Free Throws?" author: "Paul Sabin" revealjs: theme: beige editor: visual execute: echo: false warning: false error: false message: false cache: true --- ```{r} library(parallel) library(tidyverse) library(hoopR) library(ggridges) library(gt) library(gtExtras) library(viridis) library(rstan) library(tidybayes) library(rstanarm) library(broom.mixed) library(ggimage) options(mc.cores = detectCores()) options(tibble.width = Inf) run_model <- FALSE #whether to run the model from scratch start_season <- 2015 end_season <- 2023 pbp <- NULL for(s in start_season:end_season){ pbp <- load_mbb_pbp(s) |> select( tidyselect::any_of( c("game_id", "game_date_time", "period_number", "game_half", "start_half_seconds_remaining", "team_id", "season", "away_team_id", "home_team_id", "away_team_name", "home_team_name", "home_team_spread", "game_spread", "home_favorite", "type_id", "type_text", "text", "coordinate_x", "coordinate_y", "scoring_play", "home_score", "away_score") ) ) |> mutate(across(contains("_id"), as.numeric) ) |> mutate(across(contains("scor"), as.numeric) ) |> # filter(coordinate_x == 25, coordinate_y == 0) |> filter(type_id == 540) |> #freethrows mutate(make_ft = scoring_play) |> bind_rows(pbp) cat("finished season ", s, "\n") } mcbb_games <- hoopR::load_mbb_schedule(start_season:end_season) #denote who is shooting the free throws ft_half_summary <- pbp |> mutate(first_half = period_number == 1) |> group_by_at(vars(game_id, game_date_time, first_half, team_id:home_favorite) ) |> summarize(fta = n(), ftm = sum(make_ft) ) |> mutate(ftp = ftm / fta) |> left_join(mcbb_games |> distinct(game_id, neutral_site, home_logo, away_logo, attendance), by = "game_id") ft_half_summary <- ft_half_summary |> mutate(ft_visiting_team = ifelse(team_id == away_team_id & !neutral_site, 1, 0), ft_team = ifelse(team_id == home_team_id, paste(home_team_name, season, sep = '-'), paste(away_team_name, season, sep = '-') ), half_chr = ifelse(first_half, "first", "second") ) |> ungroup() ft_half_summary_wide <- ft_half_summary |> select(-first_half) |> pivot_wider(names_from = half_chr, values_from = c("fta", "ftm", "ftp"), values_fill = 0) #define team names ft_team_tbl <- ft_half_summary_wide |> count(ft_team) |> mutate(adj_ft_team = ifelse(n >= 20, ft_team, "replacement_team")) home_team_name_tbl <- ft_half_summary_wide |> count(home_team_name) |> mutate(adj_home_team_name = ifelse(n >= 15, home_team_name, "replacement_team")) ft_half_summary_wide <- ft_half_summary_wide |> left_join(ft_team_tbl |> select(-n), by = "ft_team") |> left_join(home_team_name_tbl |> select(-n), by = "home_team_name") ``` # Background - College basketball has among the largest home court advantages in all of sports. - Student sections have been getting increasingly creative in their attempts to affect the outcome of opposing team shooters making their free throws. # Previous Research - [Duke medical student](https://www.dukechronicle.com/article/2023/11/duke-mens-basketball-free-throw-distractions-graduate-student-section) tabulated the miss percentage for games in the 2020-2021 and 2021-2022 seasons for various different distractions the graduate student section makes behind the basket in the second half. ![](images/clipboard-1490644583.png) # Previous Research ![Duke's "Cameron Crazies"](images/clipboard-3722888607.png) # Previous Research - Arizona State is known for its "curtain of distraction" which rose to fame in 2015 and even had a surprise distraction of shirtless olympian Michael Phelps. ![](images/clipboard-2783188797.png) # Previous Research - The Harvard Sports Analytics Collective did a [study](https://harvardsportsanalysis.org/2015/02/how-distracting-is-asus-curtain-of-distraction/) in 2015 about the Arizona State "Curtain of Distraction" - Reported that visiting teams shot 60.6% from the free-throw line in the second half compared to 68.6% in the first half. - Estimated it was worth approximately 1.41 points per game. - Not a statistically significant # Previous Research - Harvard PhD student and author of the `ncaahoopR` package, Luke Benz [revisited](https://lukebenz.com/post/asu_curtain/) the Arizona State distraction in 2019 and found that there was not enough evidence to conclude that teams shoot worse against Arizona State than they do at other *road* games. # Data - Data from the 2015-2016 college basketball season until the current season's data (2023-2024). - Home teams with less than 15 games in the dataset, grouped into one collective team. - Only look at true road games, not neutral site. # Data - Group all season's together for the home team when assessing a fan base/court effect - But for the individual team shooting performances I treat each season separately # Methodology Two methods I explore: 1. Second Half vs First Half (shooting into a rowdy student section) 2. Overall effect of a home court on free throw percentage ## First Half vs Second - Not every team may put a large student section behind the second half visiting team's basket - Want to assess if there is an effect for teams - Compare entire county to what we would expect due to randomness. ## First Half vs Second - Only include teams with 200 free throws attempted at their home court in *both* the first half and second half in data set - Results of **largest drop-off in free-throw percentage** from the 1st half to the 2nd half ```{r} statistical_test_first_second_df <- ft_half_summary_wide |> filter(ft_visiting_team == 1) |> group_by(adj_home_team_name, home_logo) |> summarize(fta_first = sum(fta_first), ftm_first = sum(ftm_first), fta_second = sum(fta_second), ftm_second = sum(ftm_second), ) |> ungroup() |> mutate(ftp_first = ftm_first / fta_first, ftp_second = ftm_second / fta_second, ftp_diff = ftp_second - ftp_first ) |> arrange(ftp_diff) ``` ```{r} #saving off data with at least 200 fta in both halfs statistical_test_first_second_df_for_plotting <- statistical_test_first_second_df |> filter(fta_first >= 200, fta_second >= 200) #adjusting for the second half higher ft% avg_ftp_half_diff <- statistical_test_first_second_df_for_plotting |> mutate(tot_fta = fta_first + fta_second) |> summarize(avg_ftp_diff = weighted.mean(ftp_diff, w = tot_fta)) |> pull(avg_ftp_diff) statistical_test_first_second_df_for_plotting <- statistical_test_first_second_df_for_plotting |> mutate(# I take off 1 percent of ftp in the second half # since on average visiting teams shoot 1 percent better in 2nd half ftp_diff_z = (ftp_diff - avg_ftp_half_diff) / sqrt( ( ftp_first*(1-ftp_first) / fta_first ) + ( ftp_second*(1-ftp_second)) / fta_second ), ftp_diff_pvalue = 2*pnorm(-abs(ftp_diff_z)) ) ``` ```{r} min_pct <- statistical_test_first_second_df_for_plotting |> select(ftp_first, ftp_second) |> min() max_pct <- statistical_test_first_second_df_for_plotting |> select(ftp_first, ftp_second) |> max() min_pct_plot <- floor(min_pct*10)/10 max_pct_plot <- ceiling(max_pct*10)/10 first_vs_second_half_scatter_plot <- statistical_test_first_second_df_for_plotting |> ggplot(aes(x = ftp_first, y = ftp_second) ) + geom_smooth(method = 'glm', se = FALSE) + geom_abline() + geom_image(aes(image = home_logo)) + ggtitle("Visiting Team's Free Throw Percentage by Half", subtitle = "2015-16 to 2023-24 Seasons (Min 200 FTA in Each Half)") + xlab("First Half %") + ylab("Second Half %") + xlim(min_pct_plot, max_pct_plot) + ylim(min_pct_plot, max_pct_plot) + theme_bw() + theme(aspect.ratio = 1) first_vs_second_half_scatter_plot ``` ## First Half vs Second ```{r} statistical_test_first_second_df_for_plotting |> ggplot(aes(x = ftp_diff)) + geom_histogram() + geom_vline(xintercept = 0, color = 'red', linetype = 2) + theme_bw() + xlab("% Difference (2nd - 1st)") + ggtitle("Visiting Team's Free Throw Percentage Change by Half", subtitle = "2015-16 to 2023-24 Seasons (Min 200 FTA in Each Half)") ``` ## First Half vs Second - In fact, on average, visiting teams shoot `r round(avg_ftp_half_diff*100)` percent *better* in the second half than compared to the first. - Could be due to the effect of the end of game scenarios in basketball. - If a team is winning and getting fouled it does its best to get the ball into the hands of their best free throw shooters. - Hard to test this theory - Would need to go with free throw share by players conditional on time remaining and score. - I adjust p-values and z-scores by taking off an additional 1% of all 2nd half FT% ## First Half vs Second ```{r} ftp_half_diff_gt_table <- statistical_test_first_second_df_for_plotting %>% dplyr::select( home_team = adj_home_team_name, home_logo, ftp_first, ftp_second, ftp_diff, ftp_diff_pvalue ) %>% gt() %>% cols_label( home_team = "Home Court", home_logo = "", ftp_first = "1st", ftp_second = "2nd", ftp_diff = "Diff", ftp_diff_pvalue = "PValue" ) %>% # # text_transform( # # locations = cells_body(c("fbref_logo_url")), # # fn = function(x) { # # map(x, ~ web_image(.x)) # # } # # ) %>% # tab_style( # style = cell_text(weight = "bold"), # locations = cells_column_labels() # ) %>% cols_align( align = c("center"), columns = everything() ) %>% fmt_percent(columns = ftp_first:ftp_diff, decimals = 0) %>% fmt_number(columns = ftp_diff_pvalue, decimals = 2) %>% gt_img_rows(columns = home_logo, img_source = "web", height = 30) %>% tab_header( title = "Visiting Free Throw % by Half", subtitle = "2015-16 to 2023-24 Seasons" ) %>% # gt_theme_538() %>% data_color( columns = c(ftp_first, ftp_second, ftp_diff, ftp_diff_pvalue), colors = scales::col_numeric( palette = viridis(10), domain = NULL) # You can adjust the number of colors as needed ) %>% opt_interactive() ftp_half_diff_gt_table ``` ## First Half vs Second - Arizona State: "Curtain of Distraction" - BYU, has a wild student section behind the basket in the second half as well ![BYU's Student Section Attempting to Distract Opposing Shooter](https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.deseret.com%2F2023%2F3%2F21%2F23641639%2Fbyu-cougars-basketball-what-does-future-hold&psig=AOvVaw3DWKVx28U_NJNm1rbdg-ux&ust=1708696758651000&source=images&cd=vfe&opi=89978449&ved=0CBMQjRxqFwoTCKC3ksyNv4QDFQAAAAAdAAAAABAE) ## First Half vs Second - Chance? - Significance Test? - Multiple test corrections - Bonferroni: $\frac{\alpha}{n}$ - $n=355$ teams in our dataset, extremely high standard ## First Half vs Second - Chance? QQ Plot ```{r} qq_plot_data <- statistical_test_first_second_df_for_plotting |> select(ftp_diff_z, ftp_diff_pvalue) |> arrange(ftp_diff_z) |> mutate(normal_quantile_z = qnorm(row_number()/n())) ``` ```{r} qq_plot <- qq_plot_data |> ggplot(aes(x = ftp_diff_z, y = normal_quantile_z)) + geom_point() + geom_abline() + xlab("Observed Z-values") + ylab("Theoretical Z-values") + theme_bw() + xlim(-3.3,3.3) + ylim(-3.3,3.3) + theme(aspect.ratio = 1) + ggtitle("QQ Plot") qq_plot ``` ## Full Game Effect Model - College basketball has many teams in Division I with no stands behind the basket or no students. - Can analyze general specific home court effect on visiting team's true ability in free-throw shooting percentage. ## Full Game Effect Model - Samples sizes really are too small to do any player-level analysis - Stick to team-level analysis - A team's FT % in road game minus average FT % in home games ## Full Game Effect Model What is a better comparison: - Free Throw Percentage at home games? - Free Throw Percentage at other raod games? ## Full Game Effect Model I used home games - More consistent data set (other road games have many different changing variables) - Power conference teams typically have may more home games than true road games. ## Full Game Effect Model This model doesn't - Account for travel to higher elevations (my [previous work](https://www.espn.com/blog/statsinfo/post/_/id/130346/bracket-tips-from-the-basketball-power-index) at ESPN showed this to matter a lot) - Day's rest for teams - Distance traveled - Individual player affects on FT% (assume this averages out over a season) ## Full Game Effect Model My model can be defned as \begin{align} y_i &= \beta_0 + \gamma_j + \epsilon\\ \gamma_j &\sim Cauchy(0,1)\\ \epsilon & \sim N(0,\sigma^2) \end{align} - $y_i$ is the difference the visiting team's free throw percentage in game $i$ and its free throw percentage in al their home games that season. - $\gamma_j$ is the effect of the home team $j$'s - Weight the observations by the total number of free throw attempts in the game. # Results - Use posterior draws for the team effects to calculate the estimated effect a team's home court has on the opposing team's free throw percentage. - Calculate the posterior *probability* that a team's home court has an effect that decreases the probability of an opponent making a free throw. ```{r} ### For simpler method: calculate home team ft% for each season ## model only away games the difference in ft% vs home ft_half_summary_wide <- ft_half_summary_wide |> mutate(fta_game = fta_second + fta_first, ftm_game = ftm_second + ftm_first) #ft percentage in non-away games team_season_home_ft_pct <- ft_half_summary_wide |> filter(ft_visiting_team == 0) |> group_by(adj_ft_team) |> summarize(tot_fta = sum(fta_game), tot_ftm = sum(ftm_game), games = n(), avg_fta = mean(fta_game), avg_ftm = mean(ftm_game), avg_ftp = sum(ftm_game) / sum(fta_game) ) away_team_game_summary <- ft_half_summary_wide |> filter(ft_visiting_team == 1) |> left_join(team_season_home_ft_pct |> rename_at( vars(tot_fta:last_col()), ~paste0("home_court_", .x) ), by = c("adj_ft_team") ) |> mutate(ftp_game = ftm_game / fta_game, diff_ftp_from_home = ftp_game - home_court_avg_ftp) if(run_model){ diff_ftp_model <- stan_lmer(diff_ftp_from_home ~ 1 + (1 | adj_home_team_name),#the home court team data = away_team_game_summary, prior = cauchy(), weights = fta_game, iter = 800, chains = 4) diff_ftp_model |> write_rds("diff_ftp_model.rds") }else{ diff_ftp_model <- read_rds("diff_ftp_model.rds") } ``` # Results ```{r} # summary(diff_ftp_model) #avg home court effects home_team_effect_draws <- tidybayes::spread_draws(diff_ftp_model, b[term,group]) |> mutate(home_team_name = str_remove(group, "adj_home_team_name:") |> str_replace_all("_", " ")) |> rename(home_effect = b) |> ungroup() |> dplyr::select(-term, -group ) home_team_ft_effect_summary <- home_team_effect_draws |> group_by(home_team_name) |> summarize(prob_less_0 = mean(home_effect < 0), avg_effect = mean(home_effect), sd_effect = sd(home_effect) ) |> ungroup() |> arrange(desc(prob_less_0), avg_effect) |> left_join(ft_half_summary_wide |> distinct(home_team_name, home_logo) |> na.omit(), by = c("home_team_name") ) home_team_ft_effect_summary_gt_table <- home_team_ft_effect_summary %>% dplyr::select( home_team = home_team_name, home_logo, avg_effect, sd_effect, prob_less_0 ) %>% gt() %>% cols_label( home_team = "Home Court", home_logo = "", avg_effect = "Effect on FT%", sd_effect = "Std. Dev.", prob_less_0 = "Prob(Effect < 0)" ) %>% # # text_transform( # # locations = cells_body(c("fbref_logo_url")), # # fn = function(x) { # # map(x, ~ web_image(.x)) # # } # # ) %>% # tab_style( # style = cell_text(weight = "bold"), # locations = cells_column_labels() # ) %>% cols_align( align = c("center"), columns = everything() ) %>% fmt_percent(columns = avg_effect:prob_less_0, decimals = 1) %>% gt_img_rows(columns = home_logo, img_source = "web", height = 30) %>% tab_header( title = "Home Team's Venue Effect on Visiting Free Throws", subtitle = "Hierarchical Bayesian Model Using 2015-16 to 2023-24 Seasons" ) %>% # gt_theme_538() %>% data_color( columns = c(avg_effect, sd_effect, prob_less_0), colors = scales::col_numeric( palette = viridis(10), domain = NULL) # You can adjust the number of colors as needed ) %>% opt_interactive() home_team_ft_effect_summary_gt_table ``` # Results We can compare evidence of 2nd half effects to overall modeled affect of home crowd/court. ```{r} both_ft_impact_plot <- home_team_ft_effect_summary |> left_join(statistical_test_first_second_df_for_plotting |> select(home_team_name = adj_home_team_name, ftp_diff_z), by = c("home_team_name") ) |> filter(!is.na(ftp_diff_z)) |> ggplot(aes(y = avg_effect, x = ftp_diff_z)) + geom_image(aes(image = home_logo)) + geom_smooth(method = 'glm', se = FALSE) + theme_bw() + ylab("Home Court Effect per Game (Pct.)") + xlab("Z-Score Second Half minus First") + annotate("text", x = -2.25, y = -0.036, label = "Whole Game Effect & 2nd Half") + ggtitle("Impact on Visiting Team Free Throws by Home Court", subtitle = "2015-16 to 2023-24") both_ft_impact_plot ``` # Results Order of Magnitude for 2nd Half FTA: - Average FTA per game: `r mean(away_team_game_summary$home_court_avg_fta)`. - Say average 2nd half FTA are `r round(mean(away_team_game_summary$home_court_avg_fta)/2)` - Impact per game in 2nd half is about: `r round((mean(away_team_game_summary$home_court_avg_fta)/2) *min(statistical_test_first_second_df_for_plotting$ftp_diff) ) `for the largest team effect. # Results Order of Magnitude for whole game FT% Effect: - Average FTA per game: `r mean(away_team_game_summary$home_court_avg_fta)`. - Impact per game is about: `r floor(mean(away_team_game_summary$home_court_avg_fta) *min(statistical_test_first_second_df_for_plotting$ftp_diff) )` for the largest team effect. # Conclusion - A handful of teams have clear evidence of a 2nd half effect - A handful of teams have clear evidence of teams shooting worse at FT's in their stadium - Confounded with altitude, distanc traveled - Should get more teams to try 2nd half distractions!