The purpose of the CalledStrike
package (version 0.5.6)
is to provide several visualizations of smoothed measures over the zone
using Statcast data. Also the package is helpful for visualizing pitch
locations and smoothed pitch values over the zone.
This package is facilitated by several packages.
The baseballr
package is useful for collecting
Statcast data.
The mcgv
package contains the gam()
function for doing the generalized additive model fitting.
The ggplot2
package is used to construct the
visualizations.
The metR
package contains the special
geom_contour_fill()
function for displaying the filled
contour plots.
The package is currently on Github and can be installed by use of the
install_github()
function in the remotes
package:
library(remotes)
install_github("bayesball/CalledStrike")
Here are the basic steps to produce these graphs.
By use of one of the set_up functions, define the relevant
dataset and define any new variables if needed. For example, if one
wants to graph the probability of contact on a swing, then want to only
consider pitches where the batter takes a swing (function
setup_swing()
). To graph the probability of a called
strike, one restricts the pitches to the balls or strikes that are
called (function setup_called()
).
Use a generalized additive model to fit a smooth model to the measure over the (plate_x, plate_z) surface. In the case where the response is binary (1 or 0), fit a binary gam with logit link. When the response is continuous, such as a launch velocity, use a continuous-response gam.
Use the model estimates to predict the measure over a 50 by 50
grid over the zone. (The grid_predict()
function is
used.)
Use the geom_tile()
or
geom_contour_fill()
geometric objects to graph the
predicted values
There are several functions that aid in the collection of Statcast
data – these functions are wrappers to the collection functions in the
baseballr
package.
Collecting | Function |
---|---|
collect data for a single player | collect_player() |
collect data for several players | collect_many_players() |
collect data for single player for four seasons | collect_4_years() |
collect data for single player for several seasons | collect_many_years() |
Here is an example of applying these functions. Suppose we are interested in graphing Bryce Harper’s launch speed on balls put in play over the zone for the 2021 season.
Collect the data
First I use the collect_player()
function to collect the
Statcast data for Bryce Harper.
library(CalledStrike)
bh <- collect_player("Bryce Harper",
Season = 2021)
## [1] "Be patient, this may take a few seconds..."
## [1] "Data courtesy of the Chadwick Bureau Register (https://github.com/chadwickbureau/register)"
Data setup
Next I use the setup_inplay()
function to construct a
data frame containing only the balls put into play.
bh_ip <- setup_inplay(bh)
Gam fit
The function ls_gam_fit()
will construct a generalized
additive smooth fit of the launch speed values over the zone.
ls_fit <- ls_gam_fit(bh_ip)
Predict over the zone
The function grid_predict()
will predict launch speeds
over a fine grid of values about the zone.
ls_grid <- grid_predict(ls_fit)
Plot
Last, the function tile_plot()
will construct a tile
plot of the smoothed launch speed values.
tile_plot(ls_grid,
"Smoothed Launch Speeds
of the 2021 Bryce Harper")
Single function
The single function ls_plot()
runs the setup function,
implements the gam file, predicts over the zone, and graphs by measure
of a tile plot.
ls_plot(bh)
There are 13 possible measures on the zone. For each type of measure,
one can construct a tile plot or a filled contour plot. For a filled
contour plot, one has the option of specifying a vector of contour line
values by use of the L
argument. For all graphs, one has
the option of specifying a title by use of the title
argument.
Graph Type | Tile Plot Function | Contour Plot Function |
---|---|---|
called strikes | called_strike_plot() |
called_strike_contour() |
swing | swing_plot() |
swing_contour() |
contact on swing | contact_swing_plot() |
contact_swing_contour() |
miss on swing | miss_swing_plot() |
miss_swing_contour() |
in-play on swing | inplay_swing_plot() |
inplay_swing_contour() |
launch speed | ls_plot() |
ls_contour() |
launch angle | la_plot() |
la_contour() |
spray angle | sa_plot() |
sa_contour() |
batting average | hit_plot() |
hit_contour() |
home run | home_run_plot() |
home_run_contour() |
wOBA | woba_plot() |
woba_contour() |
expected batting average | ehit_plot() |
ehit_contour() |
expected wOBA | ewoba_plot() |
ewoba_contour() |
The input to each function is a Statcast data frame or a list of Statcast data frames. If the input is a list of data frames, one will see a paneled graphical display which is useful for comparison.
The package contains the dataset sc_sample
containing
Statcast data for all swings for 10 hitters in the 2019 season.
First load in the CalledStrike
package and the
dplyr
package which will be used for the
filter()
function.
library(CalledStrike)
library(dplyr)
One of the hitters is Nolan Arenado and I will collect Arenado’s data for this demonstration.
na <- filter(sc_sample,
player_name == "Nolan Arenado")
I’ll look at locations of called strikes.
called_strike_contour(na, L = c(0.25, .5, .75))
I’ll look where Arenado swung at the pitch.
swing_contour(na, L = seq(0, 1, by = 0.05))
Did he make contact with the pitch?
contact_swing_contour(na, L = seq(0, 1, by = 0.05))
For the balls where he made contact, we look at the launch velocity.
ls_contour(na, L = seq(60, 100, by = 2))
What was the expected batting average on the balls in play?
ehit_contour(na, L = seq(0, .5, by = 0.01))
What was the expected wOBA on balls in play?
ewoba_contour(na, L = seq(0, .5, by = 0.01))
Illustrate several type of comparison graphs.
na_both <- split(na, na$p_throws)
ewoba_contour(na_both, L = seq(0, .5, by = 0.01),
title = "Expected wOBA Against Pitchers of Two Sides")
mm <- filter(sc_sample,
player_name == "Manny Machado")
mc <- filter(sc_sample,
player_name == "Matt Chapman")
two_players <- list(mm, mc)
names(two_players) <- c("Machado", "Chapman")
ls_contour(two_players, L = seq(60, 100, by = 2),
title = "Launch Velocities of 2 Hitters")
There are two functions for visualizing pitch locations.
The function location_count()
will show the locations of
pitches for a specific pitcher on a particular count.
The package includes the dataset sc_pitchers_2019
that
contains Statcast data for 20 pitchers for the 2019 season.
Suppose we want to look at the locations of Aaron Nola’s pitches on a
0-0 count. I can find Nola’s MLBAM id number by use of the
chadwick
dataset (also included in the package) that
contains the id numbers for all players.
chadwick %>%
filter(name_last == "Nola", name_first == "Aaron")
## name_first name_last key_mlbam
## 1 Aaron Nola 605400
To produce the graph, type
location_count(sc_pitchers_2019,
605400, "Aaron Nola", "0-0")
Note that it shows the location of both fastballs and off-speed pitches againt right and left-handed hitters.
The function location_count_compare()
will contrast the
locations of pitches for a specific pitcher on several counts.
For example, suppose we wish to compare the location of Nola’s off-speed pitchers against right-handers across the counts “0-0”, “0-1”, “1-0”, and “1-1”.
Then you would type the following:
location_count_compare(sc_pitchers_2019, 605400,
"Aaron Nola", "R", "Offspeed",
c("0-0", "0-1", "1-0", "0-2"))
There are several functions for computing and visualizing pitch values.
The function compute_pitch_values()
will compute
pitch values for a Statcast dataset.
The function pitch_value_contour()
will produce a
contour graph of smoothed pitch values over the zone.
The dataset sc2019_pv
contains pitch values for all
pitches in the 2019 season
As an example, here is a contour plot of the smoothed pitch values of four-seam fastballs thrown by right-handed pitcher on a 0-0 count.
d00 <- sc2019_pv %>%
filter(pitch_type == "FF",
Count == "0-0",
p_throws == "R")
pitch_value_contour(d00,
title = "Pitch Values on Four-Seamers
Thrown by Right-Handers on a 0-0 Count")
The package contains six Shiny apps to facilitate the use of these functions.
The function ShinyDemo()
produces a Shiny app that
illustrates many of the smoothed measure graphs on a group of 10 hitters
from the 2019 season.
The function PitchLocation()
produces a Shiny app
that can be used to compare the locations of pitches across several
counts for one of a group of 2019 starting pitchers.
The function PitchLocation2()
produces a Shiny app
similar to PitchLocation()
but one is able to import a
Statcast dataset and choose a pitcher of interest by inputting his
name.
The function PitchValues()
produces a Shiny app to
compare pitch value locations across several counts for a specific
batting side and pitching side using 2019 data stored in
sc2019_pv
.
The function PitchValues2()
produces a Shiny app to
compare pitch value locations across several counts for a specific
batting side and pitching side using Statcast data that is imported as a
csv file.
The function BattingMeasure()
produces a Shiny app
to compare smoothed batting measures for a specific player against left
and right-handed pitchers and pitch type (fastball or off-speed). One
enters the Statcast data as a csv data file.