Plotting baseball pitches as qualitative variable by color
I was thinking of doing this in R but am new to it and would appreciate any help
I have a dataset (pitches) of baseball pitches identified by 'pitchNumber' and 'outcome' e.g S = swin开发者_运维知识库ging strike, B = ball, H= hit etc.
e.g. 1 B ; 2 H ; 3 S ; 4 S ; 5 X ; 6 H; etc.
All I want to do is have a graph that plots them in a line cf BHSSXB but replacing the letter with a small bar colored to represent the letter, with a legend, and optionally having the pitch number above the color . Somewhat like a sparkline.
Any suggestion on how to implement this much appreciated
And the same graph using ggplot
.
Data courtesy of @GavinSimpson.
ggplot(baseball, aes(x=pitchNumber, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
Here is a base graphics idea from which to work. First some dummy data:
set.seed(1)
baseball <- data.frame(pitchNumber = seq_len(50),
outcome = factor(sample(c("B","H","S","S","X","H"),
50, replace = TRUE)))
> head(baseball)
pitchNumber outcome
1 1 H
2 2 S
3 3 S
4 4 H
5 5 H
6 6 H
Next we define the colours we want:
## better colours - like ggplot for the cool kids
##cols <- c("red","green","blue","yellow")
cols <- head(hcl(seq(from = 0, to = 360,
length.out = nlevels(with(baseball, outcome)) + 1),
l = 65, c = 100), -1)
then plot the pitchNumber
as a height 1 histogram-like bar (type = "h"
), suppressing the normal axes, and we add on points to the tops of the bars to help visualisation:
with(baseball, plot(pitchNumber, y = rep(1, length(pitchNumber)), type = "h",
ylim = c(0, 1.2), col = cols[outcome],
ylab = "", xlab = "Pitch", axes = FALSE, lwd = 2))
with(baseball, points(pitchNumber, y = rep(1, length(pitchNumber)), pch = 16,
col = cols[outcome]))
Add on the x-axis and the plot frame, plus a legend:
axis(side = 1)
box()
## note: this assumes that the levels are in alphabetical order B,H,S,X...
legend("topleft", legend = c("Ball","Hit","Swinging Strike","X??"), lty = 1,
pch = 16, col = cols, bty = "n", ncol = 2, lwd = 2)
Gives this:
This is in response to your last comment on @Gavin's answer. I'm going to build off of the data provided by @Gavin and the ggplot2 plot by @Andrie. ggplot()
supports the concept of faceting by a variable or variables. Here you want to facet by pitcher and at the pitch limit of 50 per row. We'll create a new variable that corresponds to each row we want to plot separately. The equivalent code in base graphics would entail adjusting mfrow
or mfcol
in par()
and calling separate plots for each group of data.
#150 pitches represents a somewhat typical 9 inning game.
#Thanks to Gavin for sample data.
longGame <- rbind(baseball, baseball, baseball)
#Starter goes 95 pitches, middle relief throws 35, closer comes in for 20 and the glory
longGame$pitcher <- c(rep("S", 95), rep("M", 35), rep("C",20))
#Adjust pitchNumber accordingly
longGame$pitchNumber <- c(1:95, 1:35, 1:20)
#We want to show 50 pitches at a time, so will combine the pitcher name
#with which set of pitches this is
longGame$facet <- with(longGame, paste(pitcher, ceiling(pitchNumber / 50), sep = ""))
#Create the x-axis in increments of 1-50, by pitcher
longGame <- ddply(longGame, "facet", transform, pitchFacet = rep(1:50, 5)[1:length(facet)])
#Convert facet to factor in the right order
longGame$facet <- factor(longGame$facet, levels = c("S1", "S2", "M1", "C1"))
#Thanks to Andrie for ggplot2 function. I change the x-axis and add a facet_wrap
ggplot(longGame, aes(x=pitchFacet, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
facet_wrap(~facet, ncol = 1) +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
You can obviously change the labels for the facet variable, but the above code will produce:
精彩评论