Introduction

ggplot2 is a R package for graphing data based on the “The Grammar of Graphics” framework introduced by Leland Wilkinson. This package is used to construct all of the graphs for the book Visualizing Baseball. The purpose of this document to introduce ggplot2 for a familiar baseball dataset. In this document, I introduce the basic framework and illustrate the use of ggplot2 to construct graphs for different types of variables.

Some Baseball Data

Collect hitting data for all teams in the 2015 baseball season. For each team, I compute its slugging percentage SLG and its on-base percentage OBP.

library(dplyr)
library(Lahman)
teams2015 <- filter(Teams, yearID == 2015)
names(teams2015)[18:19] <- c("X2B", "X3B")
teams2015$SF <- as.numeric(teams2015$SF)
teams2015$HBP <- as.numeric(teams2015$HBP)
teams2015 <- mutate(teams2015,
                    X1B = H - X2B - X3B - HR,
                    TB = X1B + 2 * X2B + 3 * X3B + 4 * HR,
                    SLG = TB / AB,
                    OBP = (H + BB + HBP) / 
                      (AB + BB + HBP + SF))

Three Basic Components of a ggplot2 Graph

To construct a graph using ggplot2, one needs …

  1. A data frame that contains the data that you want to graph.

  2. Aesthetics or roles assigned to particular variables in the data frame.

  3. A geometric object (or geom for short) which is what you are plotting.

For example, suppose we wish to construct a scatterplot of the on-base percentage and the slugging percentages for all teams in the 2015 season.

  1. The data frame teams2015 contains the data and OBP and SLG are the variables of interest.

  2. To construct a scatterplot, you need to have a variable on the horizontal axis (x) and a variable on the vertical axis (y). If I want OBP to be the horizontal axis variable and SLG the vertical axis variable, I would assign the aethetics OBP to x and SLG to y.

Steps 1 and 2 are communicated by the command

library(ggplot2)
ggplot(data=teams2015, aes(x=OBP, y=SLG))