开发者

Convert absolute values to ranges for charting in R

Warning: still new to R.

I'm trying to construct some charts (specifically, a bubble chart) in R that shows political donations to a campaign. The idea is that the x-axis will show the amount of contributions, the y-axis the number of contributions, and the area of the circles the total amount contributed at this level.

The data looks like this:

CTRIB_NAML    CTRIB_NAMF    CTRIB_AMT    FILER_ID
John          Smith         $49          123456789

The FILER_ID field is used to filter the data for a particular candidate.

I've used the following functions to convert this data frame into a bubble chart (thanks to help here and here).

vals<-sort(unique(dfr$CTRIB_AMT))
sums<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, sum)
counts<-tapply( dfr$CTRIB_AMT, dfr$CTRIB_AMT, length)

symbols(vals,counts, circles=sums, fg="white", bg="red", xlab="Amount of Contribution", ylab="Number of Contributions")
text(vals, counts, sums, cex=0.75)

However, this results in way too many intervals on the x-axis. There are several million records all told, and divided up for some candidates could still result in an overwhelming amount of data. How can I convert the absolute contributions into ranges? For instance, how can I group the vals into ranges, e.g., 0-10, 11-20, 21-30, etc.?

----EDIT----

Following comments, I can convert vals to numeric and then slice into intervals, but I'm not sure then how I combine that back into the bubble chart syntax.

new_vals <- as.numeric(as.character(sub("\\$","",vals)))
new_vals <- cut(new_vals,100)

But regraphing:

symbols开发者_JAVA技巧(new_vals,counts, circles=sums)

Is nonsensical -- all the values line up at zero on the x-axis.


Now that you've binned vals into a factor with cut, you can just use tapply again to find the counts and the sums using these new breaks. For example:

counts = tapply(dfr$CTRIB_AMT, new_vals, length)
sums   = tapply(dfr$CTRIB_AMT, new_vals, sum)

For this type of thing, though, you might find the plyr and ggplot2 packages helpful. Here is a complete reproducible example:

require(ggplot2)

# Options
n = 1000
breaks = 10

# Generate data
set.seed(12345)
CTRIB_NAML = replicate(n, paste(letters[sample(10)], collapse=''))
CTRIB_NAMF = replicate(n, paste(letters[sample(10)], collapse=''))
CTRIB_AMT  = paste('$', round(runif(n, 0, 100), 2), sep='')
FILER_ID   = replicate(10, paste(as.character((0:9)[sample(9)]), collapse=''))[sample(10, n, replace=T)]

dfr = data.frame(CTRIB_NAML, CTRIB_NAMF, CTRIB_AMT, FILER_ID)

# Format data
dfr$CTRIB_AMT = as.numeric(sub('\\$', '', dfr$CTRIB_AMT))
dfr$CTRIB_AMT_cut = cut(dfr$CTRIB_AMT, breaks)

# Summarize data for plotting
plot_data = ddply(dfr, 'CTRIB_AMT_cut', function(x) data.frame(count=nrow(x), total=sum(x$CTRIB_AMT)))

# Make plot
dev.new(width=4, height=4)
qplot(CTRIB_AMT_cut, count, data=plot_data, geom='point', size=total) + opts(axis.text.x=theme_text(angle=90, hjust=1))

Convert absolute values to ranges for charting in R

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜