Here is a function plot_hr_trajectory
that will graph a specific player’s home run trajectory. It uses three packages: Lahman
contains the season-to-season data, dplyr
helps with data management, stringr
helps with one string operation, and ggplot2
does the graphing.
Here is some insight how plot_hr_trajectory
works:
The input is the player’s full name in quotes.
Using the Master
data frame in the Lahman
package, I find the playerID
and birth information for that player.
From the Batting
data frame of hitting data, I collect HR
, AB
for all seasons of the player’s career.
I find the Age
variable by first finding the player’s birthyear, adjusting the birthyear depending on the birthmonth, and then defining Age
.
I use ggplot2
to construct a scatterplot and smoothing curve for the home run rate HR
/ AB
.
plot_hr_trajectory <- function(playername){
require(Lahman)
require(dplyr)
require(stringr)
require(ggplot2)
names <- unlist(str_split(playername, " "))
info <- filter(Master, nameLast==names[2],
nameFirst==names[1])
bdata <- filter(Batting, playerID==info$playerID)
bdata <- mutate(bdata,
birthyear = ifelse(info$birthMonth >= 7,
info$birthYear + 1, info$birthYear),
Age = yearID - birthyear)
ggplot(bdata, aes(yearID, HR / AB)) +
geom_point() +
geom_smooth(method="loess", se=FALSE)
}
I illustrate using this function for two players. Note that I am saving the ggplot2
plotting object in a variable. By just typing the variable name, I see the graph.
p1 <- plot_hr_trajectory("Mickey Mantle")
p1
p2 <- plot_hr_trajectory("Mike Schmidt")
p2
The ggplot2
object contains the plotting data. So I combine the data from the two earlier plotting objects to construct a graph that compares the two trajectories.
ggplot(rbind(p1$data, p2$data), aes(Age, HR / AB)) +
geom_point() +
geom_smooth(method="loess", se=FALSE) +
facet_wrap(~ playerID, ncol=1)