开发者

adding text to ggplot geom_jitter points that match a condition

How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered 开发者_如何学编程points so I can pass to geom_text?

My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).

One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?

[update]

From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.

He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)

dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
                      lab=paste('id_',1:300,sep='')) 

This is the result of joran example with my data and lowering the display of ids to the lowest 1%

adding text to ggplot geom_jitter points that match a condition

And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):

library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
                          lab=paste('id_',1:300,sep=''),quality= rnorm(300))

#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))

#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
               g$grp <- g$y <= quantile(g$y,0.01);
               g$top_q <- g$qual <= quantile(g$qual,0.01);
               g})

#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
  geom_boxplot() +
  geom_point(data=datJit,aes(x=xj,colour=quality)) +
  geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
  geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))

adding text to ggplot geom_jitter points that match a condition


Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:

#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
        lab=rep('label',300))

#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))

#Create an indicator variable that picks out those 
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
             g$grp <- g$y <= quantile(g$y,0.1); g})

#Create a boxplot, overlay the jittered points and 
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) + 
    geom_boxplot() + 
    geom_point(data=datJit,aes(x=xj)) + 
    geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))        


Just an addition to Joran's wonderful solution: I ran into trouble with the x-axis positioning when I tried to use in a facetted plot using facet_wrap(). The problem is, that ggplot2 uses 1 as the x-value on every facet. The solution is to create a vector of jittered 1s:

datJit$xj <- jitter(rep(1,length(dat$x)),amount=0.1)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜