R Grouping/Aggregation where the condition involves other rows in the table, not just the current row
Using R, what is the开发者_如何学Go best way I can aggregate rows on a condition that spans multiple rows. For example to aggregate any rows where z = 0 for n or more times.
What this would look like run on the following sample table with n = 3.
Sample Table x:
x y z
0 0 6
5 5 0
40 2 0
4 0 0
10 0 1
0 0 2
11 7 0
0 4 0
0 0 0
0 0 0
0 0 2
18 0 4
Results Table:
x y z
0 0 6
49 7 0 <- Above two rows got aggregated
10 0 1
0 0 2
11 11 0 <- Above three rows got aggregated
0 0 2
18 0 4
Since it seems like you're still in the "leaRning phase", I thought an example using the plyr package would be helpful. plyr is an extremely handy library which allows you to slice/dice datasets and summarize their subgroups in a flexible (and terse -- as you'll see below) manner, so it would likely be worth your time to get to know. If you find yourself needing to do similar operations on extremely large data sets, you might also consider looking into the data.table package.
I'm assuming you've done Roman's textConnection
trick to get your data into a data.frame named mmf
.
I'm adding an idx
column to mmf
so you can subset it and process the results group by group:
library(plyr)
# mmf <- read.table(textConnection( ...
rle.idx <- rle(mmf$z)
mmf$idx <- rep(seq(RLE$lengths), RLE$lengths)
ans <- ddply(mmf, .(idx), colwise(sum))
And ans
looks like:
x y z idx
0 0 6 1
49 7 0 6
10 0 1 3
0 0 2 4
11 11 0 20
0 0 2 6
18 0 4 7
Just remove the idx
column and you're done, eg:
ans <- ans[, -4]
This is the code I used to produce your result. If you have any questions, fire away.
mmf <- read.table(textConnection("x y z # read in your example data
0 0 6
5 5 0
40 2 0
4 0 0
10 0 1
0 0 2
11 7 0
0 4 0
0 0 0
0 0 0
0 0 2
18 0 4"), header = TRUE)
# see where there are zeros in the y column
mmf.rle <- rle(mmf$z)
mmf.rle <- data.frame(lengths = mmf.rle$lengths, values = mmf.rle$values)
merge.rows <- 3
# select rows that have more or equal to three zeros
mmf.zero <- which(mmf.rle$values == 0 & mmf.rle$lengths >= merge.rows)
for (i in mmf.zero) {
# find which positions are zero, calculate sums and insert the result into a data.frame where the rows in question were turned to NA
m.mmf <- mmf.rle$lengths[1:i] # select elements from 1 to where the zero appears
select.rows <- (sum(m.mmf[1:length(m.mmf) - 1])+1):sum(m.mmf) # magic
mmf.sum <- colSums(mmf[select.rows, ]) # sum values column-wise for rows that have at least three zeros in z
mmf[select.rows,] <- NA # now that we have a sum by columns, we turn those numbers into NAs...
mmf[select.rows[1], ] <- mmf.sum # ... and insert summed result into the first NA row
}
# remove any left over NA rows
mmf <- mmf[complete.cases(mmf),]
DATA
mmf <- read.table(textConnection("x y z # read in your example data 0 0 6 5 5 0 40 2 0 4 0 0 10 0 1 0 0 2 11 7 0 0 4 0 0 0 0 0 0 0 0 0 2 18 0 4"), header = TRUE)
CODE
agg_n <- function(dat=mmf,coln="z",n=3){
agg <- function(.x) {
# Sum values if first n=3 records in column coln="z" are 0
if(all(.x[[coln]][seq(n)] == 0)) {
y <- rbind(colSums(.x[seq(n),]),.x[-1*seq(n),])
} else y <- .x
return(y)
}
# Groups of records starting with 0 in column coln="z"
G <- cumsum(diff(c(0L,dat[[coln]] == 0))==1)
new_dat <- do.call(rbind,lapply(split(dat,G),agg))
return(new_dat)
}
OUTPUT
> agg_n()
x y z
0 0 0 6
1.1 49 7 0
1.5 10 0 1
1.6 0 0 2
2.1 11 11 0
2.10 0 0 0
2.11 0 0 2
2.12 18 0 4
精彩评论