Over the years, a number of people have struggled using our R functions to download the Retrosheet play-by-play files. Since the process is pretty simple, I thought it might help to describe a simple process for getting the files in csv format.
First I visit the Retrosheet play-by-play download page. I’m interested in downloading the 2018 files, so I click on the “2018” link and the zip file is placed in my Downloads folder.
https://www.retrosheet.org/game.htm
I navigate to my computer’s Downloads folder and find the 2018eve.zip. I double-click on this filename to download all of the individual Retrosheet files, one for each home team.
When the unzipping process is done, I should see these files.
I go to my Terminal program and navigate to the folder containing these Retrosheet files. Assuming I have the Chadwick files installed, I type:
cwevent -y 2018 -f 0-96 2018*.EV* > all2018.csv
(This indicates that I want to include all of the Retrosheet fields and it is output to the file all2018.csv.)
To see if this works, I should now see the file “all2018.csv” in the Downloads folder.
Now I can read in this Retrosheet csv file into RStudio. I don’t have the names of the variables yet in the file. So I download the fields information from my website and place this information in the header.
library(readr)
data <- read_csv("all2018.csv",
col_names = FALSE)
fields <- read.csv("http://bayesball.github.io/baseball/fields.csv")
names(data) <- fields[, "Header"]