Load several useful packages.

```
library(tidyverse)
library(ggplot2)
```

Collect several useful tables from Fangraphs. The first dataset contains basic hitting statistics and the second dataset has stats related to plate discipline. We merge the two datasets, creating a single data frame, 146 observations and 33 variables.

```
d1 <- read_csv("https://bayesball.github.io/VB/data/Dashboard_2016.csv")
d2 <- read_csv("https://bayesball.github.io/VB/data/Plate_Discipline_2016.csv")
d <- inner_join(d1, d2, by="playerid")
vars <- c(14, 25:33)
d_subset <- d[, vars]
names(d_subset) <- c("OBP", "O_Swing", "Z_Swing", "Swing",
"O_Contact", "Z_Contact",
"Contact", "Zone",
"F_Strike", "SwStr")
names(d)[c(14, 25:33)] <- names(d_subset)
```

Following graph constructs a scatterplot of the swing and contact rates for all hitters with a smoothing curve added.

```
ggplot(d, aes(Swing, Contact)) +
geom_point(size=2) +
geom_smooth(se=FALSE) +
xlab("Swing Rate") + ylab("Contact Rate")
```

We divide the players into “high” and “low” strikeout groups. We use contact and swing rates to predict (by a logistic model) the probability a hitter is in the high strikeout group. A line is added to the scatterplot – points above (below) the line are predicted to be in the low (high) K groups.

```
d$K_Rate <- with(d, ifelse(K > .1875, "HI", "LO"))
d$y <- ifelse(d$K_Rate=="HI", 1, 0)
glm(y ~ Contact + Swing, data=d, family=binomial) -> F
ggplot(d, aes(Swing, Contact,
color=K_Rate)) +
geom_point(size=3) +
xlab("Swing Rate") + ylab("Contact Rate") +
geom_abline(intercept = coef(F)[1] / (-coef(F)[2]),
slope = coef(F)[3] / (-coef(F)[2])) +
scale_shape(solid = FALSE) +
scale_colour_manual(values = c("black", "grey60"))
```

We divide the players into “high” and “low” walk groups. We use contact and swing rates to predict (by a logistic model) the probability a hitter is in the high walk group. A line is added to the scatterplot – points to the left (to the right) of the line are predicted to be in the high (low) walk groups.

```
d$BB_Cat <- with(d, ifelse(BB > .082, "HI", "LO"))
d$y <- ifelse(d$BB_Cat=="HI", 1, 0)
glm(y ~ Contact + Swing, data=d, family=binomial) -> F
ggplot(d, aes(Swing, Contact,
color=BB_Cat)) +
xlab("Swing Rate") + ylab("Contact Rate") +
geom_point(size=3) +
geom_abline(intercept = coef(F)[1] / (-coef(F)[2]),
slope = coef(F)[3] / (-coef(F)[2])) +
scale_shape(solid = FALSE) +
scale_colour_manual(values = c("black", "grey60"))
```