Introduction to the CalledStrike Package

Introduction

The purpose of the CalledStrike package (version 0.5.6) is to provide several visualizations of smoothed measures over the zone using Statcast data. Also the package is helpful for visualizing pitch locations and smoothed pitch values over the zone.

This package is facilitated by several packages.

The baseballr package is useful for collecting Statcast data.
The mcgv package contains the gam() function for doing the generalized additive model fitting.
The ggplot2 package is used to construct the visualizations.
The metR package contains the special geom_contour_fill() function for displaying the filled contour plots.

Installation

The package is currently on Github and can be installed by use of the install_github() function in the remotes package:

library(remotes)
install_github("bayesball/CalledStrike")

Producing the Graphs

Here are the basic steps to produce these graphs.

By use of one of the set_up functions, define the relevant dataset and define any new variables if needed. For example, if one wants to graph the probability of contact on a swing, then want to only consider pitches where the batter takes a swing (function setup_swing()). To graph the probability of a called strike, one restricts the pitches to the balls or strikes that are called (function setup_called()).
Use a generalized additive model to fit a smooth model to the measure over the (plate_x, plate_z) surface. In the case where the response is binary (1 or 0), fit a binary gam with logit link. When the response is continuous, such as a launch velocity, use a continuous-response gam.
Use the model estimates to predict the measure over a 50 by 50 grid over the zone. (The grid_predict() function is used.)
Use the geom_tile() or geom_contour_fill() geometric objects to graph the predicted values

Collecting Statcast Data

There are several functions that aid in the collection of Statcast data – these functions are wrappers to the collection functions in the baseballr package.

Collecting	Function
collect data for a single player	`collect_player()`
collect data for several players	`collect_many_players()`
collect data for single player for four seasons	`collect_4_years()`
collect data for single player for several seasons	`collect_many_years()`

Example – Launch Speeds of the 2021 Bryce Harper

Here is an example of applying these functions. Suppose we are interested in graphing Bryce Harper’s launch speed on balls put in play over the zone for the 2021 season.

Collect the data

First I use the collect_player() function to collect the Statcast data for Bryce Harper.

library(CalledStrike)
bh <- collect_player("Bryce Harper",
                     Season = 2021)

## [1] "Be patient, this may take a few seconds..."
## [1] "Data courtesy of the Chadwick Bureau Register (https://github.com/chadwickbureau/register)"

Data setup

Next I use the setup_inplay() function to construct a data frame containing only the balls put into play.

bh_ip <- setup_inplay(bh)

Gam fit

The function ls_gam_fit() will construct a generalized additive smooth fit of the launch speed values over the zone.

ls_fit <- ls_gam_fit(bh_ip)

Predict over the zone

The function grid_predict() will predict launch speeds over a fine grid of values about the zone.

ls_grid <- grid_predict(ls_fit)

Plot

Last, the function tile_plot() will construct a tile plot of the smoothed launch speed values.

tile_plot(ls_grid,
  "Smoothed Launch Speeds 
of the 2021 Bryce Harper")

Single function

The single function ls_plot() runs the setup function, implements the gam file, predicts over the zone, and graphs by measure of a tile plot.

ls_plot(bh)

Graph Types

There are 13 possible measures on the zone. For each type of measure, one can construct a tile plot or a filled contour plot. For a filled contour plot, one has the option of specifying a vector of contour line values by use of the L argument. For all graphs, one has the option of specifying a title by use of the title argument.

Graph Type	Tile Plot Function	Contour Plot Function
called strikes	`called_strike_plot()`	`called_strike_contour()`
swing	`swing_plot()`	`swing_contour()`
contact on swing	`contact_swing_plot()`	`contact_swing_contour()`
miss on swing	`miss_swing_plot()`	`miss_swing_contour()`
in-play on swing	`inplay_swing_plot()`	`inplay_swing_contour()`
launch speed	`ls_plot()`	`ls_contour()`
launch angle	`la_plot()`	`la_contour()`
spray angle	`sa_plot()`	`sa_contour()`
batting average	`hit_plot()`	`hit_contour()`
home run	`home_run_plot()`	`home_run_contour()`
wOBA	`woba_plot()`	`woba_contour()`
expected batting average	`ehit_plot()`	`ehit_contour()`
expected wOBA	`ewoba_plot()`	`ewoba_contour()`

Input is a Data Frame or a List

The input to each function is a Statcast data frame or a list of Statcast data frames. If the input is a list of data frames, one will see a paneled graphical display which is useful for comparison.

Example: Nolan Arenado

The package contains the dataset sc_sample containing Statcast data for all swings for 10 hitters in the 2019 season.

First load in the CalledStrike package and the dplyr package which will be used for the filter() function.

library(CalledStrike)
library(dplyr)

One of the hitters is Nolan Arenado and I will collect Arenado’s data for this demonstration.

na <- filter(sc_sample,
             player_name == "Nolan Arenado")

I’ll look at locations of called strikes.

called_strike_contour(na, L = c(0.25, .5, .75))

I’ll look where Arenado swung at the pitch.

swing_contour(na, L = seq(0, 1, by = 0.05))

Did he make contact with the pitch?

contact_swing_contour(na, L = seq(0, 1, by = 0.05))

For the balls where he made contact, we look at the launch velocity.

ls_contour(na, L = seq(60, 100, by = 2))

What was the expected batting average on the balls in play?

ehit_contour(na, L = seq(0, .5, by = 0.01))

What was the expected wOBA on balls in play?

ewoba_contour(na, L = seq(0, .5, by = 0.01))

Example: Comparison Graphs

Illustrate several type of comparison graphs.

How does Arenado perform against left and right arm pitchers using the expected wOBA measure?

na_both <- split(na, na$p_throws)
ewoba_contour(na_both, L = seq(0, .5, by = 0.01),
              title = "Expected wOBA Against Pitchers of Two Sides")

Compare Manny Machado with Matt Chapman with respect to launch velocity.

mm <- filter(sc_sample,
             player_name == "Manny Machado")
mc <- filter(sc_sample,
             player_name == "Matt Chapman")
two_players <- list(mm, mc)
names(two_players) <- c("Machado", "Chapman")
ls_contour(two_players, L = seq(60, 100, by = 2),
           title = "Launch Velocities of 2 Hitters")

Graphing Pitch Locations

There are two functions for visualizing pitch locations.

The function location_count() will show the locations of pitches for a specific pitcher on a particular count.

The package includes the dataset sc_pitchers_2019 that contains Statcast data for 20 pitchers for the 2019 season.

Suppose we want to look at the locations of Aaron Nola’s pitches on a 0-0 count. I can find Nola’s MLBAM id number by use of the chadwick dataset (also included in the package) that contains the id numbers for all players.

chadwick %>% 
  filter(name_last == "Nola", name_first == "Aaron")

##   name_first name_last key_mlbam
## 1      Aaron      Nola    605400

To produce the graph, type

location_count(sc_pitchers_2019, 
               605400, "Aaron Nola", "0-0")

Note that it shows the location of both fastballs and off-speed pitches againt right and left-handed hitters.

The function location_count_compare() will contrast the locations of pitches for a specific pitcher on several counts.

For example, suppose we wish to compare the location of Nola’s off-speed pitchers against right-handers across the counts “0-0”, “0-1”, “1-0”, and “1-1”.

Then you would type the following:

location_count_compare(sc_pitchers_2019, 605400, 
              "Aaron Nola", "R", "Offspeed", 
              c("0-0", "0-1", "1-0", "0-2"))

Pitch Values

There are several functions for computing and visualizing pitch values.

The function compute_pitch_values() will compute pitch values for a Statcast dataset.
The function pitch_value_contour() will produce a contour graph of smoothed pitch values over the zone.
The dataset sc2019_pv contains pitch values for all pitches in the 2019 season

As an example, here is a contour plot of the smoothed pitch values of four-seam fastballs thrown by right-handed pitcher on a 0-0 count.

d00 <- sc2019_pv %>% 
  filter(pitch_type == "FF",
         Count == "0-0",
         p_throws == "R")
pitch_value_contour(d00,
   title = "Pitch Values on Four-Seamers 
   Thrown by Right-Handers on a 0-0 Count")

Shiny Apps

The package contains six Shiny apps to facilitate the use of these functions.

The function ShinyDemo() produces a Shiny app that illustrates many of the smoothed measure graphs on a group of 10 hitters from the 2019 season.
The function PitchLocation() produces a Shiny app that can be used to compare the locations of pitches across several counts for one of a group of 2019 starting pitchers.
The function PitchLocation2() produces a Shiny app similar to PitchLocation() but one is able to import a Statcast dataset and choose a pitcher of interest by inputting his name.
The function PitchValues() produces a Shiny app to compare pitch value locations across several counts for a specific batting side and pitching side using 2019 data stored in sc2019_pv.
The function PitchValues2() produces a Shiny app to compare pitch value locations across several counts for a specific batting side and pitching side using Statcast data that is imported as a csv file.
The function BattingMeasure() produces a Shiny app to compare smoothed batting measures for a specific player against left and right-handed pitchers and pitch type (fastball or off-speed). One enters the Statcast data as a csv data file.