开发者

Binning data, finding results by group, and plotting using R

The pre-installed quakes dataset has 5 variables and 1000 observations.

The simple graph I'm t开发者_运维百科rying to create should show the average earthquake magnitude by earthquake depth category (i.e. Y-axis = Magnitude , X-axis = Depth Categories).

In this dataset, the earthquake depth variables range from 40 to 680. I would like to turn the 1000 observations of earthquake depth into 8 categories, e.g. 40 - 120, 121 - 200, ... 600 - 680. Then, I'd like to take the average earthquake magnitude by depth category and plot it on a line chart.

I appreciate any help with this. Thanks!


First classify into depth classes with cut:

depth.class <- cut(quakes$depth, c(40, 120, 200, 300, 400, 500, 600, 680), include.lowest = TRUE)

(Note that your class definitions may need to vary for exactly what you are after and given the details of cut()'s behaviour).

Find the mean magnitude within each depth.class (assumes no NAs):

mean.mag <- tapply(quake$mag, depth.class, mean)

(Add na.rm e.g. mean.mag <- tapply(quake$mag, depth.class, mean, na.rm = TRUE) for data sets with missing values where appropriate).

Plot as a line:

plot(mean.mag, type = "l", xlab = "magnitude class")

It's a little extra work to put the class labels on the X-axis, but at that point you might question if a line plot is really appropriate here.

A quick stab, turn off the axes and then put up the classes directly from the cut factor:

plot(mean.mag, type = "l", xlab = "magnitude class", axes = FALSE)
axis(1, 1:nlevels(depth.class), levels(depth.class))
axis(2)
box()


A line plot is not useful here; what relationship do the lines between the points represent in the data? Perhaps a dotchart might be useful instead?

cats <- with(quakes, cut(depth, breaks = seq(40L, max(depth), by = 80), 
                         include.lowest = TRUE))
dat <- aggregate(mag ~ cats, data = quakes, FUN = mean)
with(dat, dotchart(mag, group = cats, xlab = "Mean Magnitude"))

Which produces:

Binning data, finding results by group, and plotting using R


Are you sure that you want a line plot here? I'm not sure that is the most appropriate plot to use here. Regardless, the trick here is to use cut to bin the data appropriately, and then use one of the many aggregation tools to find the average magnitude by those groups. Finally, we'll plot those aggregated values. I like the tools in ggplot2 and plyr for tasks like this:

library(ggplot2)
df <- quakes
df$bins <- with(df, cut(depth, breaks = c(0,40, 120, 200, 280, 360, 440, 520, 600, 680)))
df.plot <- ddply(df, .(bins), summarise, avg.mag = mean(mag))
qplot(bins, avg.mag, data = df.plot)

#If you want a line plot, here's one approach:
qplot(as.numeric(bins), avg.mag, data = df.plot, geom = "line") + 
xlim(levels(df.plot$bins))


I agree that you likely don't want a line plot but rather a dotplot() or a box chart of some kind.

You can easily do this using shingles from the lattice package:

library(lattice)
x <- runif(100)
y <- runif(100)
bwplot(~x|equal.count(y))

Substituting shingle() for equal.count() will let you specify the intervals instead of allowing R to choose them for you.

Binning data, finding results by group, and plotting using R

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜