BayesTestStreak Package

Author

Jim Albert

Published

July 20, 2025

This is an introduction to the functions in the BayesTestStreak R package. A general reference to the exploration of streakiness in sequences of binary response data can be found at https://bayesball.github.io/BLOG/Streaky.html

Installation

One can install the BayesTestStreak package from my Github site:

remotes::install_github("bayesball/BayesTestStreak")

Once the package is installed, it can be loaded by the library() function.

library(BayesTestStreak)

Collect Some Streak Data

Let’s focus on hitting streak patterns of the 2016 Jose Abreu. Retrosheet data from the 2016 baseball season is available in the dataset pbp2016 in the package.

We first find Abreu’s Retrosheet id by use of the find_id() function.

(ja_id <- find_id("Jose Abreu"))
[1] "abrej003"

The streak_data() function will create the streak data for this exercise. The inputs are the batter id, the Retrosheet play-by-play data, the indication of a “success” (here “H” for hit), and “AB = TRUE” indicates we will only consider at-bats.

ja_data <- streak_data(ja_id, pbp2016, "H", AB = TRUE)
head(ja_data)
    BAT_ID N      GAME_ID INN_CT Outcome
1 abrej003 1 OAK201604040      3       1
2 abrej003 2 OAK201604040      5       0
3 abrej003 3 OAK201604040      7       0
4 abrej003 4 OAK201604050      1       1
5 abrej003 5 OAK201604050      4       0
6 abrej003 6 OAK201604050      5       0

In the data frame ja_data, there are two key variables – N is the at-bat number and Outcome indicates if there is a hit (1) or an out (0).

Plot the Data

A basic rug plot of the occurrences of hits can be producted by the plot_streak_data() function.

plot_streak_data(ja_data)

Moving Averages

One way to visualize the streaky patterns in the hit/out data is by use of a moving average plot.

The moving_average() function will compute moving batting averages using a width of 20 at-bats.

ja_ma <- moving_average(ja_data, width = 20)
head(ja_ma) 
    BAT_ID N      GAME_ID INN_CT Outcome Index Average       AVG
1 abrej003 1 OAK201604040      3       1    NA      NA 0.2932692
2 abrej003 2 OAK201604040      5       0    NA      NA 0.2932692
3 abrej003 3 OAK201604040      7       0    NA      NA 0.2932692
4 abrej003 4 OAK201604050      1       1    NA      NA 0.2932692
5 abrej003 5 OAK201604050      4       0    NA      NA 0.2932692
6 abrej003 6 OAK201604050      5       0    NA      NA 0.2932692

The moving_average_plot() function takes the moving averages data frame as input and produces a graph of these averages.

moving_average_plot(ja_ma) +
  ggtitle("Jose Abreu Moving AVG")

Spacings

The find_spacings() function will find all spacings (number of at-bats between consecutive hits). The output is a data frame of the spacings.

ja_sp <- find_spacings(ja_data)
head(ja_sp)
  N Spacing
1 1       2
2 2       3
3 3       3
4 4       2
5 5       4
6 6       6

If the hit data are independent Bernouli random variables with a constant hitting probability, then the spacings will be independent Geometric distributed. One can graphically check if the spacings are Geometric by the use of a Geometric probability plot. If the points line up along a straight line, then the Geometric assumption is reasonable.

geometric_plot(ja_sp)
`geom_smooth()` using formula = 'y ~ x'

Permutation Test

A more formal approach to testing streakiness is by use of a permutation test. One runs this test by the permutation_test() function where the input is the streak data.

permutation_test(ja_data)
[1] 0.516

A p-value close to 0 would indicate some support for streakiness. Here that is not the case.

Bayesian Test

Another way of testing for streakiness is a Bayes factor that measures the support of a streaky model over a consistent model. The streaky model is indexed by a log K parameter and the function bayes_factor_logK() computes the log Bayes factor for a range of values of log K.

out <- bayes_factor_logK(ja_data)
head(out)
  log_K    log_BF
1   1.0 -25.59440
2   1.1 -23.01366
3   1.2 -20.64765
4   1.3 -18.48361
5   1.4 -16.50907
6   1.5 -14.71188

The log Bayes factor values are graphed below. Since the log Bayes factors are negative for all log K values, there is little evidence supporting the streaky model.

ggplot(out, aes(log_K, log_BF)) +
  geom_line()