Introduction

Over the years, a number of people have struggled using our R functions to download the Retrosheet play-by-play files. Since the process is pretty simple, I thought it might help to describe a simple process for getting the files in csv format.

STEP ONE

First I visit the Retrosheet play-by-play download page. I’m interested in downloading the 2018 files, so I click on the “2018” link and the zip file is placed in my Downloads folder.

https://www.retrosheet.org/game.htm

STEP TWO

I navigate to my computer’s Downloads folder and find the 2018eve.zip. I double-click on this filename to download all of the individual Retrosheet files, one for each home team.

When the unzipping process is done, I should see these files.

STEP THREE

I go to my Terminal program and navigate to the folder containing these Retrosheet files. Assuming I have the Chadwick files installed, I type:

cwevent -y 2018 -f 0-96 2018*.EV* > all2018.csv

(This indicates that I want to include all of the Retrosheet fields and it is output to the file all2018.csv.)

To see if this works, I should now see the file “all2018.csv” in the Downloads folder.

STEP FOUR

Now I can read in this Retrosheet csv file into RStudio. I don’t have the names of the variables yet in the file. So I download the fields information from my website and place this information in the header.

library(readr)
data <- read_csv("all2018.csv", 
   col_names = FALSE)
fields <- read.csv("http://bayesball.github.io/baseball/fields.csv")
names(data) <- fields[, "Header"]