开发者

Bubble Chart in R with # of Occurrences / Sums of Values

I'm p开发者_运维技巧laying around with drawing bubble charts in R -- the current project is to graph a bubble chart of political donations that has the following characteristics:

x-axis: size of donation, in ranges i.e. $10-$19, $20-29, $30-49, etc.
y-axis: number of donations of that amount
area of bubble: total amount of donations 

I'm not planning anything complex, just something like:

symbols(amount_ranges,amount_occurrences, circles=sums)

The data is pretty granular, so there is a separate entry for each donation and they need to summed in order to get the values I'm looking for.

For example, the data looks like this (extraneous columns removed):

CTRIB_NAML    CTRIB_NAMF    CTRIB_AMT    FILER_ID
John          Smith         $49          123456789

This is not that complex, but is there a simple way in R to count up the number of occurrences of a certain value (for the y-axis)? And to add up sum of those donations (which is derivative of the axes)? Or do I need to create a function that iterates through the data and compiles these numbers separately? Or pre-process the data in someway?


This is easy when you use the ggplot2 package with geom_point.

One of many benefits of using ggplot is that the built-in statistics means you don't have to pre-summarise your data. geom_point in combination with stat_sum is all you need.

Here is the example from ?geom_point. (Note that mtcars is a built-in dataset with ggplot2.)

See the ggplot website and geom_point for more detail.

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(size = qsec))

Bubble Chart in R with # of Occurrences / Sums of Values


You can use ddply from package plyr here. If your original data.frame was called dfr, then something close to this should work:

result<-ddply(dfr, .(CTRIB_AMT), function(partialdfr){data.frame(amt=partialdfr$CTRIB_AMT[1], sm=sum(partialdfr$CTRIB_AMT), mn=mean(partialdfr$CTRIB_AMT)) })

In fact, a base R solution is also rather simple:

vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)

I'm sure more elegant solutions exist.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜