library(dplyr)
library(ggplot2)
library(stringr)

The Data

Using the pitchRX package, I downloaded all of the pitch data for all games in the 2016 season. From this large dataset, I collected the data for 2044 pitches thrown by Clayton Kershaw.

Here I read in the pitchFX data and show a few lines.

library(readr)
CK <- read_csv("https://bayesball.github.io/VB/data/kershaw2016.csv")
head(CK)
## # A tibble: 6 x 15
##   pitch_type     px     pz             des   num
##        <chr>  <dbl>  <dbl>           <chr> <int>
## 1         FF  0.089  2.750   Called Strike     7
## 2         FF  0.083  2.721 Swinging Strike     7
## 3         FF -2.651  2.690            Ball     7
## 4         CU -0.644 -0.231            Ball     7
## 5         FF  0.642  4.521            Ball     7
## 6         FF -1.410  2.327 Swinging Strike     7
## # ... with 10 more variables: gameday_link <chr>, start_speed <dbl>,
## #   spin_dir <dbl>, spin_rate <dbl>, pfx_x <dbl>, pfx_z <dbl>, type <chr>,
## #   pitcher_name <chr>, event <chr>, stand <chr>

Here are the variables in the data frame CK.

Load several packages.

library(ggplot2)
library(dplyr)

Pitch Types Thrown

To get an understanding of what pitch types are thrown, we construct a dotplot of the frequencies of the pitch types (variable pitch_type).

S_CK <- filter(summarize(group_by(CK, pitch_type),
                  N=n()),
            pitch_type %in% c("SL", "FF", "CU", "CH"))
ggplot(S_CK, aes(pitch_type, N)) +
  geom_point(size=3, color="blue") +
  coord_flip() +
  ggtitle("Frequencies of Pitch Type of Clayton Kershaw") +
  theme(plot.title = element_text(size = 14,
                hjust = 0.5))

Pitch Speeds

These different pitch types are thrown at different speeds. The following display is a boxplot of the speeds (varialbe start_speed) of the four types of pitches thrown by Kershaw.

ggplot(filter(CK, pitch_type %in%
                c("SL", "FF", "CU", "CH")),
       aes(pitch_type, start_speed)) +
  geom_boxplot() + coord_flip() +
  ggtitle("Pitch Speeds") +
  theme(plot.title = element_text(size = 14,
                                  hjust = 0.5)) +
     ylim(70, 100)

Pitch Breaks

These pitch types are also distinguished by their movement or break. The variables pfx_x and pfx_z give the horizontal and vertical break amounts. (The perspective is from the catcher behind the plate.) The following graph shows the movements for each type of pitch.

CK <- filter(CK, pitch_type %in% c("CU",
                          "FF", "SL"))
ggplot(CK,
  aes(pfx_x, pfx_z, shape=pitch_type)) +
  geom_point(color="blue", size=2, alpha=0.5) +
  ggtitle("Pitch Breaks") +
  theme(plot.title = element_text(size = 14,
                                  hjust = 0.5)) +
  xlab("Horizontal Break") + ylab("Vertical Break")

Pitch Locations

The variables px and pz give the horizontal and vertical locations of the pitch viewed from the catcher’s perspective. The zone for an average hitter is added to the plots so we can see which pitches are inside and outside of the zone.

topKzone <- 3.5
botKzone <- 1.6
inKzone <- -0.85
outKzone <- 0.85
kZone <- data.frame(
  x=c(inKzone, inKzone, outKzone, outKzone, inKzone),
  y=c(botKzone, topKzone, topKzone, botKzone, botKzone)
)
ggplot(CK) +
  geom_point(data= filter(CK, pitch_type=="CU"),
             aes(px, pz), shape=1) +
  geom_point(data= filter(CK, pitch_type=="FF"),
             aes(px, pz), shape=2) +
  geom_point(data= filter(CK, pitch_type=="SL"),
             aes(px, pz), shape=3) +
  geom_path(aes(x, y), data=kZone, lwd=1, col="blue") +
  facet_wrap(~ pitch_type, ncol=2) +
  xlim(-2, 2) + ylim(-0.5, 5) +
  theme(strip.text = element_text(size = rel(1.5),
                                  hjust=0.5,
                                  color = "black")) +
  ggtitle("Pitch Locations") +
  theme(plot.title = element_text(size = 14,
                                  hjust = 0.5))

Two-dimensional contour plots (from fitting a two-dimensional density estimate) are helpful for visualizing the locations of the different types of pitches.

ggplot(CK) +
  geom_density_2d(aes(px, pz), color="black") +
  geom_path(aes(x, y), data=kZone, lwd=1, col="blue") +
  facet_wrap(~ pitch_type, ncol=2) +
  xlim(-2, 2) + ylim(-0.5, 5) +
  theme(strip.text = element_text(size = rel(1.5),
                                  hjust=0.5,
                                  color = "black")) +
  ggtitle("Pitch Locations") +
  theme(plot.title = element_text(size = 14,
                                  hjust = 0.5))

Pitch Outcomes

What are the outcomes of these different types of pitches? We use the variable des which gives a description of the pitch outcome.

SO <- summarize(group_by(CK, pitch_type, des), N=n())
SO <- mutate(SO,
      Outcome=ifelse(str_detect(des, "Foul") == TRUE, "Foul",
      ifelse(str_detect(des, "Swing") == TRUE |
               des == "Missed Bunt", "Swing and Miss",
      ifelse(str_detect(des, "Ball") == TRUE, "Ball",
      ifelse(str_detect(des, "In play") == TRUE, "In play",
             des)))))
SOS <- summarize(group_by(SO, pitch_type, Outcome),
                 F=sum(N))
SOS1 <- summarize(group_by(SO, pitch_type),
                 Total=sum(N))
inner_join(SOS, SOS1) %>%
  mutate(Percentage = 100 * F / Total) -> SOS
ggplot(SOS,
        aes(Outcome, Percentage)) +
  geom_point(size=3, color="blue") +
  coord_flip() + facet_wrap(~ pitch_type, ncol=1) +
  theme(strip.text = element_text(size = rel(1.5),
                                  hjust=0.5,
                                  color = "black"))

Outcome of a Swing

What if the batter swings at the pitch? We focus on the frequencies of the three outcomes “Foul”, “In play”, and “Miss” for each pitch type.

CK <- mutate(CK,
             Foul = str_detect(des, "Foul"),
             InPlay = str_detect(des, "In play"),
             Miss = str_detect(des, "Swing"),
             Swing = Foul | InPlay | Miss)
CK_swing <- filter(CK, Swing == TRUE)
ggplot(CK_swing, aes(px, pz, color=Miss)) +
  geom_point(alpha=0.75) +
  facet_wrap(~ pitch_type, ncol=2) +
  geom_path(aes(x, y), data=kZone, lwd=1, col="black") +
  facet_wrap(~ pitch_type, ncol=2) +
  xlim(-2, 2) + ylim(-0.5, 5) +
  scale_colour_manual(values = c("gray60", "blue")) +
  theme(strip.text = element_text(size = rel(1.5),
                                  hjust=0.5,
                                  color = "black"))