Plate Discipline Statistics for Batters

Load several useful packages.

library(tidyverse)
library(ggplot2)

The Data

Collect several useful tables from Fangraphs. The first dataset contains basic hitting statistics and the second dataset has stats related to plate discipline. We merge the two datasets, creating a single data frame, 146 observations and 33 variables.

d1 <- read_csv("https://bayesball.github.io/VB/data/Dashboard_2016.csv")
d2 <- read_csv("https://bayesball.github.io/VB/data/Plate_Discipline_2016.csv")
d <- inner_join(d1, d2, by="playerid")
vars <- c(14, 25:33)
d_subset <- d[, vars]
names(d_subset) <- c("OBP", "O_Swing", "Z_Swing", "Swing",
                     "O_Contact", "Z_Contact",
                     "Contact", "Zone",
                     "F_Strike", "SwStr")
names(d)[c(14, 25:33)] <- names(d_subset)

Swing and Contact Rates

Following graph constructs a scatterplot of the swing and contact rates for all hitters with a smoothing curve added.

ggplot(d, aes(Swing, Contact)) +
  geom_point(size=2) +
  geom_smooth(se=FALSE) +
  xlab("Swing Rate") + ylab("Contact Rate")

Relationship with Strikeout Rate

We divide the players into “high” and “low” strikeout groups. We use contact and swing rates to predict (by a logistic model) the probability a hitter is in the high strikeout group. A line is added to the scatterplot – points above (below) the line are predicted to be in the low (high) K groups.

d$K_Rate <- with(d, ifelse(K > .1875, "HI", "LO"))
d$y <- ifelse(d$K_Rate=="HI", 1, 0)
glm(y ~ Contact + Swing, data=d, family=binomial) -> F
ggplot(d, aes(Swing, Contact, 
              color=K_Rate)) +
  geom_point(size=3) +
  xlab("Swing Rate") + ylab("Contact Rate") +
  geom_abline(intercept = coef(F)[1] / (-coef(F)[2]),
              slope = coef(F)[3] / (-coef(F)[2])) +
  scale_shape(solid = FALSE) +
  scale_colour_manual(values = c("black", "grey60"))

Relationship with Walk Rate

We divide the players into “high” and “low” walk groups. We use contact and swing rates to predict (by a logistic model) the probability a hitter is in the high walk group. A line is added to the scatterplot – points to the left (to the right) of the line are predicted to be in the high (low) walk groups.

d$BB_Cat <- with(d, ifelse(BB > .082, "HI", "LO"))
d$y <- ifelse(d$BB_Cat=="HI", 1, 0)
glm(y ~ Contact + Swing, data=d, family=binomial) -> F
ggplot(d, aes(Swing, Contact, 
              color=BB_Cat)) +
  xlab("Swing Rate") + ylab("Contact Rate") +
  geom_point(size=3) +
  geom_abline(intercept = coef(F)[1] / (-coef(F)[2]),
              slope = coef(F)[3] / (-coef(F)[2])) +
  scale_shape(solid = FALSE) +
  scale_colour_manual(values = c("black", "grey60"))

Contrasting the top and bottom K hitters

We first identify the players who have the smallest (TOP) and largest (BOTTOM) strikeout rates.

d <- mutate(d,
            K_Type=ifelse(K < .12, "TOP",
                  ifelse(K > .25, "BOTTOM", NA)))
select(filter(d, K_Type == "TOP"),
       Name.x, Team.x, K)
## # A tibble: 16 x 3
##              Name.x    Team.x     K
##               <chr>     <chr> <dbl>
##  1     Mookie Betts   Red Sox 0.110
##  2      Jose Altuve    Astros 0.098
##  3    Adrian Beltre   Rangers 0.103
##  4    Daniel Murphy Nationals 0.098
##  5   Dustin Pedroia   Red Sox 0.105
##  6     Jose Ramirez   Indians 0.100
##  7     Buster Posey    Giants 0.111
##  8   Ender Inciarte    Braves 0.118
##  9     Martin Prado   Marlins 0.105
## 10    Yadier Molina Cardinals 0.108
## 11        Joe Panik    Giants 0.089
## 12    Jose Iglesias    Tigers 0.097
## 13    Yunel Escobar    Angels 0.118
## 14    Melky Cabrera White Sox 0.107
## 15 Brandon Phillips      Reds 0.116
## 16    Albert Pujols    Angels 0.115
select(filter(d, K_Type == "BOTTOM"),
       Name.x, Team.x, K)
## # A tibble: 16 x 3
##              Name.x       Team.x     K
##               <chr>        <chr> <dbl>
##  1  Jonathan Villar      Brewers 0.256
##  2      Adam Duvall         Reds 0.270
##  3      Chris Davis      Orioles 0.329
##  4      Khris Davis    Athletics 0.272
##  5        Jake Lamb Diamondbacks 0.259
##  6    Leonys Martin     Mariners 0.259
##  7      Mark Trumbo      Orioles 0.255
##  8   Russell Martin    Blue Jays 0.277
##  9   Danny Espinosa    Nationals 0.290
## 10      Travis Shaw      Red Sox 0.251
## 11 Michael Saunders    Blue Jays 0.281
## 12     Justin Upton       Tigers 0.286
## 13      Alex Gordon       Royals 0.292
## 14 Melvin Upton Jr.        - - - 0.288
## 15      Mike Napoli      Indians 0.301
## 16     Chris Carter      Brewers 0.320

Similarly we identify the players with the largest (TOP) and smallest (BOTTOM) walk rates

d <- mutate(d,
            BB_Type=ifelse(BB > .13, "TOP",
                  ifelse(BB < .05, "BOTTOM", NA)))
select(filter(d, BB_Type == "TOP"),
       Name.x, Team.x, BB)
## # A tibble: 13 x 3
##              Name.x       Team.x    BB
##               <chr>        <chr> <dbl>
##  1       Mike Trout       Angels 0.170
##  2   Josh Donaldson    Blue Jays 0.156
##  3       Joey Votto         Reds 0.160
##  4 Paul Goldschmidt Diamondbacks 0.156
##  5    Dexter Fowler         Cubs 0.143
##  6     Brandon Belt       Giants 0.159
##  7      Ben Zobrist         Cubs 0.152
##  8   Carlos Santana      Indians 0.144
##  9     Bryce Harper    Nationals 0.172
## 10   Matt Carpenter    Cardinals 0.143
## 11      Chris Davis      Orioles 0.132
## 12    Jose Bautista    Blue Jays 0.168
## 13        Joe Mauer        Twins 0.137
select(filter(d, BB_Type == "BOTTOM"),
       Name.x, Team.x, BB)
## # A tibble: 15 x 3
##              Name.x    Team.x    BB
##               <chr>     <chr> <dbl>
##  1   Starling Marte   Pirates 0.043
##  2     Kevin Pillar Blue Jays 0.041
##  3    Eduardo Nunez     - - - 0.049
##  4   Didi Gregorius   Yankees 0.032
##  5    Freddy Galvis  Phillies 0.040
##  6   Salvador Perez    Royals 0.040
##  7  Jonathan Schoop   Orioles 0.032
##  8     Rougned Odor   Rangers 0.030
##  9    Josh Harrison   Pirates 0.034
## 10   Starlin Castro   Yankees 0.039
## 11 Brandon Phillips      Reds 0.031
## 12    Adonis Garcia    Braves 0.043
## 13  Alcides Escobar    Royals 0.040
## 14  Marwin Gonzalez    Astros 0.042
## 15   Alexei Ramirez     - - - 0.042

Comparing Top and Bottom Strikeout Hitters

This scatterplot compares the top and bottom K groups with respect to the contact rates in the zone and outside of the zone.

ggplot(filter(d, K_Type %in% c("TOP", "BOTTOM")),
        aes(Z_Contact, O_Contact, color=K_Type)) +
         geom_point(size=3) +
  scale_shape(solid = FALSE) +
  scale_colour_manual(values = c("grey50", "black" ))

Comparing Top and Bottom Walk Hitters

This scatterplot compares the top and bottom BB groups with respect to the swing rates in the zone and outside of the zone.

ggplot(filter(d, BB_Type %in% c("TOP", "BOTTOM")),
        aes(Z_Swing, O_Swing, color=BB_Type)) +
         geom_point(size=3) +
  scale_shape(solid = FALSE) +
  scale_colour_manual(values = c("grey50", "black" ))