Baseball Analytics with RShiny

Jack Werner
4 min readNov 17, 2021

--

In a baseball context, strategic decision-making can be bolstered by fast, dynamic visualizations. RShiny is a useful R library used for the design of web apps and dashboards. This article will walk through the development of a pitcher’s usage heatmap that pulls pitch data from Baseball Savant and can be used in real time by MLB decision-makers.

App Setup

To begin, I copied the Hello Shiny template provided by RShiny on their website. We will modify this template to fit our needs as baseball analysts.

An RShiny app is made up of two major parts: User Interface and Server. I have included some libraries that will come in handy later, as well as a function call at the bottom that actually runs our app. Run the template app using RStudio’s build in “Run App” button at the top of the page to get a feel for how things look.

library(shiny)
library(ggplot2)
library(dplyr)
# Define UI for app that draws a histogram ----
ui <- fluidPage(
# App title ----
titlePanel("Hello Shiny!"),
# Sidebar layout with input and output definitions ----
sidebarLayout(
# Sidebar panel for inputs ----
sidebarPanel(
# Input: Slider for the number of bins ----
sliderInput(inputId = "bins",
label = "Number of bins:",
min = 1,
max = 50,
value = 30)
), # Main panel for displaying outputs ----
mainPanel(
# Output: Histogram ----
plotOutput(outputId = "distPlot")
)
)
)
# Define server logic required to draw a histogram ----
server <- function(input, output) {
output$distPlot <- renderPlot({ x <- faithful$waiting
bins <- seq(min(x), max(x), length.out = input$bins + 1)
hist(x, breaks = bins, col = "#75AADB", border = "white",
xlab = "Waiting time to next eruption (in mins)",
main = "Histogram of waiting times")
},width=900,height=700)}shinyApp(ui, server)

User Interface

First, we are going to work on the User Interface. We are going to modify the sidebarPanel to accept some search parameters for our pitcher. Inside the sidebarPanel, delete the sliderInput and add a textInput.

textInput("fullname", h3("Pitcher Full Name"), "Gerrit Cole", placeholder = "Firstname Lastname"), width=3

The first argument of RShiny inputs is usually the inputId, which we call “fullname.” Then we are setting the label to read “Pitcher Full Name,” the initial value to Gerrit Cole (arbitrary choice), and the placeholder to “Firstname Lastname.” I use an initial value so something shows up as soon as I load the app, and a placeholder just so things look nice when I change the pitcher.

Click “Run App” to see how the layout looks. Be careful not to lose any commas or parenthesis along the way or the app might not load. We still need to actually manage and visualize our data, so now we will move onto the Server.

Server

Now that we have the UI set up, we need to gather our pitcher’s historical data from Baseball Savant. Here, we will do some tricky stuff with the URLs from Baseball Savant.

bm = read.csv(“https://baseballsavant.mlb.com/leaderboard/custom?year=2021&type=pitcher&filter=&sort=1&sortDir=desc&min=50&selections=b_total_pa,b_home_run,b_strikeout,b_walk,&chart=false&x=b_total_pa&y=b_total_pa&r=no&chartType=beeswarm&csv=true")

firstname = strsplit(input$fullname, “ “)[[1]][1]
lastname = strsplit(input$fullname, “ “)[[1]][2]

temp = bm %>% filter_at(vars(“first_name”), any_vars(. %in% paste(“ “,firstname,sep=””)))

temp = temp %>% filter_at(vars(“last_name”), any_vars(. %in% lastname))

id = unique(temp$player_id)
beginning = "https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=FF%7CFT%7CFC%7CSI%7CFS%7CSL%7CCH%7CCU%7CKC%7CCS%7CKN%7CFO%7CEP%7CSC%7CIN%7CPO%7CAB%7CUN%7C&hfAB=&hfGT=R%7C&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfPull=&hfC=&hfSea=2021%7C&hfSit=&player_type=pitcher&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=&game_date_lt=&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&hfFlag=&hfBBT=&pitchers_lookup%5B%5D="end = paste(id, "&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=api_p_release_speed&sort_order=desc&min_pas=0&type=details&", sep="")path = paste(beginning, end, sep="")
sp = read.csv(path)

By filtering bm data by our pitcher’s name, we can extract an ID. Then, we build a path variable around the pitchers’s ID number to get info about thousands of pitches thrown by our pitcher. For this tool, I am using URLs rather than local CSV files so our data is always up-to-date, which is more useful in a gametime context.

The Savant data we are interested in is the pitch count and the frequency of each pitch at that count. Under the URL parsing, I loop through all possible pitch counts and calculate percentages of the total for each pitch.

l = c()
for (i in 0:3){
for(j in 0:2){
c = filter(sp,balls==i,strikes==j)
li = data.frame(summary(c$pitch_name)/nrow(c),paste(i,j))
li <- tibble::rownames_to_column(li, “Pitch”)
l = rbind(l,li)
}
}
colnames(l) = c(“Pitch”,”Percentage”,”Count”)

I am filtering the pitch data sp by the current count ij and then using the summary() function to build a use percentage for each pitch. To make things look nicer, I used the rownames_to_column function to help R know that I am interested in pitch_name as part of my dataframe.

Lastly, under the loop we will use ggplot2 to visualize the percentage table li.

 ggplot(l, aes(Pitch, Count)) +
geom_tile(aes(fill = Percentage)) +
geom_text(aes(label = round(Percentage, 2)), size=7) +
scale_fill_gradient(low = “lightblue”, high = “red”) +
theme(text = element_text(size=25))

By this point, we should have a working dashboard. To improve this tool, we could and metrics like filtering for batter handedness, the inning, or number of outs. It also breaks easily if the pitcher’s name is spelled wrong.

Thank you for reading. I hope you learned a little about RShiny and how to keep user familiarity/context in mind as a data analyst.

The full code is published at https://github.com/jackwerner/Baseball-Analysis

If you or anyone you know is hiring baseball analysts, feel free to connect with me via https://linkedin.com/in/jack-werner

--

--

Jack Werner
Jack Werner

Written by Jack Werner

Jack is a creative and passionate problem solver who works effectively in teams, has proven skills in data analysis, and leads with contagious initiative.

No responses yet