Mosaic Package

This brief guide provides an introduction to the mosaic package. The authors wrote the mosaic package to make R commands user-friendly to the population of students taking introductory stats. It may be easier for a student to use a menu-driven package such as StatCrunch instead of R. But if the instructor does decide on R for computation, the student only needs to learn a few commands using the mosaic package. The purpose of this brief guide is to demonstrate the use of these commands for the data analysis material in Chapters 2, 3 and 4.

Getting Started

I am assuming the student has R and RStudio installed in his/her computer. The student needs to download and install R from https://cran.r-project.org/ (choose the correct version – Mac or Windows) and also download and install RStudio from https://www.rstudio.com/ (again chose the correct version – Mac or Windows).

One starts R by opening up RStudio. Generally one gets output by typing commands in the Console window.

The mosaic Package: installation and general information

One can install the mosaic package from CRAN by typing

install.packages("mosaic")

This installation needs to be done only once. Every time you start R, one loads the package by typing

library(mosaic)

Resources for using the mosaic package can by found at

http://mosaic-web.org/r-packages/

tsub Package

All of the datasets used in the Case Studies and the Exercises in the 2nd edition of Teaching Statistics Using Baseball are available in the tsub package. One installs the package by means of the install_github function in the devtools package. (Likely you will first need to install the devtools package.)

library(devtools)
install_github("bayesball/tsub")

Each dataset from Teaching Statistics Using Baseball starts with “exer” or “case” For example, data from Exercise 3.7 is found in “exer_3_7” and data from Case Study 4.2 is found in “case_4_2”.

Chapter 2

To start, one loads in the tsub and mosaic packages using the library function. This needs to be done every time the student is starting R through RStudio.

library(tsub)
library(mosaic)

The dotPlot function constructs a dotplot of some numeric data. Below we construct a dotplot of Derek Jeter’s home run counts contained in the case_2_2 dataset.

dotPlot(~ HR, data=case_2_2)

The histogram function constructs a histogram of the same home run counts.

histogram(~ HR, data=case_2_2)

The favstats function finds the five number summary and other useful statistics for a numeric variable.

favstats(~ HR, data=case_2_2)
##  min Q1 median    Q3 max mean       sd  n missing
##    0 10     13 18.25  24   13 6.882472 20       0

The tally function displays a tally of a categorical variable. Below I tally the teams that Rickey Henderson has played in his career.

tally(~ Tm, data=case_1_1)
## Tm
## BOS LAD NYM NYY OAK SDP TOT 
##   1   1   1   4  12   2   4

The bargraph function displays a bar plot of categorical data. Here we display a graph of the teams that Henderson has played.

bargraph(~ Tm, data=case_1_1)

Chapter 3

The bwplot function constructs parallel boxplots of different groups of numeric data. Here I compare the HR counts of Pujols and Rameriz contained in the case_3_1 dataset.

bwplot(Player ~ HR, data=case_3_1)

The histogram function can be used to construct histograms of groups of numeric data.

histogram(~ HR | Player, data=case_3_1, layout=c(1, 2))

Similarly, the dotPlot function can be used to display dotplots of groups of numeric data.

dotPlot(~ HR | Player, data=case_3_1, layout=c(1, 2))

The favstats function is used to compute summary statistics for each group of numeric data. Here I use favstats to compare the HR counts of Pujols and Rameriz.

favstats(HR ~ Player, data=case_3_1)
##    Player min Q1 median   Q3 max     mean        sd  n missing
## 1  Pujols  17 33     37 42.5  49 37.33333  8.338094 15       0
## 2 Rameriz   9 26     35 41.0  45 32.52941 10.840420 17       0

Chapter 4

The xyplot function is used to construct a scatterplot. Here I use this function to construct a graph of R.G (y) against OPS (x) for the team data in Case 4.1.

xyplot(R.G ~ OPS, data=case_4_1)

The lm function is used to fit a line to these data. I save the fit in a variable called model and the equation of the line is displayed by typing “model”.

model <- lm(R.G ~ OPS, data=case_4_1)
model
## 
## Call:
## lm(formula = R.G ~ OPS, data = case_4_1)
## 
## Coefficients:
## (Intercept)          OPS  
##      -3.227       10.421

Below we display the scatterplot and display the best line on top of the scatterplot.

model <- lm(R.G ~ OPS, data=case_4_1)
ln <- makeFun(model)
xyplot(R.G ~ OPS, data=case_4_1)

plotFun(ln(w) ~ w, add=TRUE)

The cor function is used to compute a correlation coefficent between two numeric variables.

with(case_4_1, cor(R.G, OPS))
## [1] 0.9052043