开发者

Dealing with Factors

I have an object that is a factor with a number of levels:

x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))

In the shortest manner possible, I would like to use ggplot to create a bar graph of the % frequency of each factor.

I keep finding that there are a lot of little annoyances that get in between summarizing and plotting when I have a factor. Here are a few examples of what I mean by annoyances:

as.data.frame(summ开发者_运维技巧ary(x)) 

You have to rename the columns and the 1st column values are now rownames in the last example. In the next, you have to cheat to use cast and then you have to relabel because it defaults to a colname of "(all)".

as.data.frame(q1$com.preferred)
dat$value <- 1
colnames(dat) <- c("pref", "value")
cast(dat, pref ~.)
colnames(dat)[2] <- "value"

Here's another example, somewhat better, but less than ideal.

data.frame(x=names(summary(x)),y=summary(x))

If there's a quick way to do this within ggplot, I'd be more than interested to see it. So far, my biggest problem is changing counts to frequencies.


Following up on @dirk and @joran's suggestions (@joran really gets credit. I thought as.data.frame(), and not just data.frame(), was necessary, but it turns out @joran's right ...

x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
t1 <- table(x)
t2 <- data.frame(t1)
t3 <- data.frame(prop.table(t1))
qplot(x,Freq,data=t2,geom="bar",ylab="Count")
qplot(x,Freq,data=t3,geom="bar",ylab="Proportion")

edit: shortened slightly (incorporated @Chase's prop.table too)


You can have qplot do the summary work for you without the outside computations, try any of the following:

x <- rep(c('A','B','C'), c(20,10,15))

qplot(x, weight=1/length(x), ylab='Proportion')
qplot(x, weight=100/length(x), ylab='Percent')
qplot(x, weight=1/length(x), ylab='Percent') + scale_y_continuous(formatter='percent')

ggplot(data.frame(x=x),aes(x, weight=1/length(x))) + geom_bar() + ylab('Proportion')

There is probably a way to do this using transformations inside the ggplot functions as well, but I have not found it yet.


Did you try the ggplot-equivalent of just calling barplot(table(x)/length(x)) ? I.e.

R> x <- as.factor(c(rep("A",20),rep("B",10),rep("C",15)))
R> table(x)
x
 A  B  C 
20 10 15 

which we turn into percentages easily

R> table(x)/length(x)*100
x
      A       B       C 
44.4444 22.2222 33.3333 

and can then plot

R> barplot(table(x)/length(x)*100)

just fine:

Dealing with Factors

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜