Dealing with Factors
I have an object that is a factor with a number of levels:
x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
In the shortest manner possible, I would like to use ggplot to create a bar graph of the % frequency of each factor.
I keep finding that there are a lot of little annoyances that get in between summarizing and plotting when I have a factor. Here are a few examples of what I mean by annoyances:
as.data.frame(summ开发者_运维技巧ary(x))
You have to rename the columns and the 1st column values are now rownames in the last example. In the next, you have to cheat to use cast and then you have to relabel because it defaults to a colname of "(all)".
as.data.frame(q1$com.preferred)
dat$value <- 1
colnames(dat) <- c("pref", "value")
cast(dat, pref ~.)
colnames(dat)[2] <- "value"
Here's another example, somewhat better, but less than ideal.
data.frame(x=names(summary(x)),y=summary(x))
If there's a quick way to do this within ggplot, I'd be more than interested to see it. So far, my biggest problem is changing counts to frequencies.
Following up on @dirk and @joran's suggestions (@joran really gets credit. I thought as.data.frame()
, and not just data.frame()
, was necessary, but it turns out @joran's right ...
x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
t1 <- table(x)
t2 <- data.frame(t1)
t3 <- data.frame(prop.table(t1))
qplot(x,Freq,data=t2,geom="bar",ylab="Count")
qplot(x,Freq,data=t3,geom="bar",ylab="Proportion")
edit: shortened slightly (incorporated @Chase's prop.table
too)
You can have qplot do the summary work for you without the outside computations, try any of the following:
x <- rep(c('A','B','C'), c(20,10,15))
qplot(x, weight=1/length(x), ylab='Proportion')
qplot(x, weight=100/length(x), ylab='Percent')
qplot(x, weight=1/length(x), ylab='Percent') + scale_y_continuous(formatter='percent')
ggplot(data.frame(x=x),aes(x, weight=1/length(x))) + geom_bar() + ylab('Proportion')
There is probably a way to do this using transformations inside the ggplot functions as well, but I have not found it yet.
Did you try the ggplot-equivalent of just calling barplot(table(x)/length(x))
? I.e.
R> x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
R> table(x)
x
A B C
20 10 15
which we turn into percentages easily
R> table(x)/length(x)*100
x
A B C
44.4444 22.2222 33.3333
and can then plot
R> barplot(table(x)/length(x)*100)
just fine:
精彩评论