Can I avoid using data frames in ggplot2?
I'm running a monte-carlo simulation and the output is in the form:
> d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
> d
iter k1 k2
1 0.2 0.3
2 0.6 0.4
The plots 开发者_如何学GoI want to generate are:
plot(d$iter, d$k1)
plot(density(d$k1))
I know how to do equivalent plots using ggplot2, convert to data frame
new_d = data.frame(iter=rep(d$iter, 2),
k = c(d$k1, d$k2),
label = rep(c('k1', 'k2'), each=2))
then plotting is easy. However the number of iterations can be very large and the number of k's can also be large. This means messing about with a very large data frame.
Is there anyway I can avoid creating this new data frame?
Thanks
Short answer is "no," you can't avoid creating a data frame. ggplot
requires the data to be in a data frame. If you use qplot
, you can give it separate vectors for x and y, but internally, it's still creating a data frame out of the parameters you pass in.
I agree with juba's suggestion -- learn to use the reshape
function, or better yet the reshape
package with melt
/cast
functions. Once you get fast with putting your data in long format, creating amazing ggplot
graphs becomes one step closer!
Yes, it is possible for you to avoid creating a data frame: just give an empty argument list to the base layer, ggplot()
. Here is a complete example based on your code:
library(ggplot2)
d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
# desired plots:
# plot(d$iter, d$k1)
# plot(density(d$k1))
ggplot() + geom_point(aes(x = d$iter, y = d$k1))
# there is not enough data for a good density plot,
# but this is how you would do it:
ggplot() + geom_density(aes(d$k1))
Note that although this allows for you not to create a data frame, a data frame might still be created internally. See, e.g., the following extract from ?geom_point
:
All objects will be fortified to produce a data frame.
You can use the reshape
function to transform your data frame to "long" format. May be it is a bit faster than your code ?
R> reshape(d, direction="long",varying=list(c("k1","k2")),v.names="k",times=c("k1","k2"))
iter time k id
1.k1 1 k1 0.2 1
2.k1 2 k1 0.6 2
1.k2 1 k2 0.3 1
2.k2 2 k2 0.4 2
So just to add to the previous answers. With qplot you could do
p <- qplot(y=d$k2, x=d$k1)
and then from there building it further, e.g. with
p + theme_bw()
But I agree - melt/cast is genereally the way forward.
Just pass NULL as the data frame, and define the necessary aesthetics using the data vectors. Quick example:
library(MASS)
library(tidyverse)
library(ranger)
rf <- ranger(medv ~ ., data = Boston, importance = "impurity")
rf$variable.importance
ggplot(NULL, aes(x = fct_reorder(names(rf$variable.importance), rf$variable.importance),
y = rf$variable.importance)) +
geom_col(fill = "navy blue", alpha = 0.7) +
coord_flip() +
labs(x = "Predictor", y = "Importance", title = "Random Forest") +
theme_bw()
精彩评论