开发者

In ggplot restrict y to be >0 in LOESS

Here's my code:

#data
sites <- 
  structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L), 
                 date = c(13493L, 13534L, 13566L, 13611L, 13723L,
                      开发者_开发百科    13752L, 13804L, 13837L, 13927L, 14028L,
                          14082L, 14122L, 14150L, 14182L, 14199L,
                          16198L, 16279L, 16607L, 16945L, 17545L,
                          17650L, 17743L, 17868L, 17941L, 18017L, 18092L),
                 y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 
                       11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L,
                       6L, 1L, 0L, 5L, 0L)), 
            .Names = c("site", "date", "y"),
            class = "data.frame", row.names = c(NA, -26L))

#convert to date
x<-as.Date(sites$date, origin="1960-01-01") 

#plot smooth, line goes below zero!
qplot(data=sites, x, y, main="Site 349") 
(p <- qplot(data = sites, x, y, xlab = "", ylab = ""))
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5))

Some of the LOESS lines and confidence intervals go below zero, I would like to restrict the graphics to 0 and positive numbers (because negative do not make sense).

In ggplot restrict y to be >0 in LOESS

How can I do this?


I am seconding Matt Parker's suggestion that you have to change the fitting procedure. One option that often works for positive-only data, is to do the fit on log-scale, and then exponentiate to get results on the original scale. This will guarantee positive only values.

Generating random data that has some of this issues:

 d <- data.frame(x=0:100)
 d$y <- exp(rnorm(nrow(d), mean=-d$x/40, sd=0.8))
 qplot(x,y,data=d) + stat_smooth() 

Now we can use ggplot's transformation capabilites to log-transform the y-values, but display the results on an exponential scale (which corresponds to the original one):

qplot(x,y,data=d) + stat_smooth() + scale_y_log10()+coord_trans(ytrans="pow10")

You can see examples like this on the coord_trans help page. If you don't like the y-axis, you can manipulate the breaks and labels.

EDIT based on the question update

There have been some changes in ggplot2 since the question was originally asked, and the original answer did not deal with 0's.

Option 1

The main idea of the solution is the same: find a transformation that will map the range of possible values to -Inf to Inf, do the loess smooth there, and then backtransform the result. The log-transformation would be great if there were no zeroes. I don't think the required function exists if 0 is included, but a possibility that often works is the log(1+x) transformation. That is built-in, but we need to have the inverse transformation exp(x)-1 as well.

library(scales)
#create exp(x)-1 transformation, the inverse of log(1+p)
expm1_trans <-  function() trans_new("expm1", "expm1", "log1p")

qplot(x, y, data=sites) + stat_smooth(method="loess") +
  scale_y_continuous(trans=log1p_trans()) +
  coord_trans(ytrans=expm1_trans())

In ggplot restrict y to be >0 in LOESS

Option 2

The second option extends the suggestion in the comments to Matt Parker's answer: use a regression method that incorporates the integer nature of the outcomes. That means overdispersed (just in case) Poisson regression for counts. While you can't do loess, you can do a spline fit. You can play with the degrees of freedom to control the smoothness.

library(splines)
qplot(x, y, data=sites) + stat_smooth(method="glm", family="quasipoisson", 
                                      formula = y ~ ns(x, 3))

In ggplot restrict y to be >0 in LOESS

The two options give quite similar results, which is a good thing.


I can't test it without some example data, but

qplot(data=sites, x, y, main="Site 349")  
(p <- qplot(data = sites, x, y, xlab = "", ylab = "")) 
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5)) 
p1 + theme_bw() + opts(title = "Site 349") + ylim(0, foo)

(where foo is a suitable upper limit for your plot) might do the trick. Unlike in base graphics, the xlim() and ylim() commands in ggplot actually restrict the data that are used in making the plot, rather than just the plot window. It might also restrict the geom_smooth() (though I'm not certain).

Edit: After reading a bit more, you might also want to consider switching out the model that geom_smooth is using. Again, not being able to see your data is a problem. But, for example, if it's binary - you can add stat_smooth(method="glm", family="binomial") to get a logit-smoothed line. See ?stat_smooth for more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜