Merge list of data.frames with list element name as factor in merged data frame

2023-03-23 21:22 问答作者：

I have a data.frame, like the following, where location is a factor and sample is some measurement sample:

  location sample
1      'A'   0.10
2      'A'   0.20
3      'A'   0.15
4      'B'   0.15
5      'B'   0.99
6      'B'   0.54
...

I have a function ECCDFpts(df), where df is a data.frame, that returns a set of <x,y> points on the empirical CCDF of df$sample, like so:

    x     y
1 0.0  1.00
2 0.1  0.99
3 0.2  0.75
...

Note that the number of <x,y> points returned is "arbitrary". There is 开发者_开发技巧not a one-to-one mapping between input samples and output <x,y> rows.

I would like to compute this CCDF data on a per factor (e.g., location) basis, yielding a data.frame like this:

  location    x    y
1      'A'  0.0  1.0
2      'A'  0.1  1.0
3      'A'  0.2  0.3
4      'B'  0.0  1.0
5      'B'  0.1  1.0
6      'B'  0.2  0.7
...

My current approach is to split the initial data frame on factor location:

eccdfs_by_factor <- by(data, data$location, ECCDFpts)

This yields a list of data.frames:

data$location: A
    x    y
1 0.0  1.0
2 0.1  1.0
3 0.2  0.3
-----------------
data$location: B
    x    y
1 0.0  1.0
2 0.1  1.0
3 0.2  0.7

I don't know how to merge or unsplit this back into my desired form, shown previously. I want to merge such that the name of the elements (data.frames) in the list becomes a column factor in the combined data.frame.

Solution:

This is a classic split-apply-combine problem, apparently. The cleanest solutions below use the plyr package function ddply(...)to do both the splitting, applying, and combining in one line! There's no need for the base by function I used above.

Update: If I understand what you want you correctly...

library(plyr)
ldply(your_data)

For example:

x <- list(a=data.frame(x=c(1,2,3,4),y=c(2,3,4,5)),
          b=data.frame(x=c(4,3,2,1),y=c(5,4,3,2)))
ldply(x)

  .id x y
1   a 1 2
2   a 2 3
3   a 3 4
4   a 4 5
5   b 4 5
6   b 3 4
7   b 2 3
8   b 1 2

A one shot solution uses the plyr package. Since I don't know your ECDFpts function, I am going to write my own to illustrate the usage.

# DEFINE DUMMY DATA
mydata = data.frame(
  location = rep(LETTERS[1:3], each = 3),
  sample   = runif(9)
)

# DEFINE DUMMY FUNCTION
myfunc = function(dat){
   x = dat - mean(dat)
   y = dat - median(dat)
   return(data.frame(x, y)) 
}

# USE PLYR TO APPLY FUNCTION BY LOCATION
library(plyr)
ans = ddply(mydata, .(location), transform, x = myfunc(sample)$x, 
         y = myfunc(sample)$y)

  location sample       x      y
1        A  0.911  0.3279  0.232
2        A  0.678  0.0958  0.000
3        A  0.159 -0.4237 -0.520
4        B  0.908  0.3096  0.048
5        B  0.860  0.2615  0.000
6        B  0.027 -0.5711 -0.833
7        C  0.745  0.0694  0.000
8        C  0.343 -0.3327 -0.402
9        C  0.939  0.2633  0.194

EDIT. As identified in the comments by @David, the code can be further simplified as

# DEFINE DUMMY FUNCTION
myfunc = function(dat){
   x = with(dat, sample - mean(sample))
   y = with(dat, sample - median(sample))
   return(data.frame(x, y)) 
}

ans = ddply(mydata, .(location), myfunc)

  location       x        y
1        A -0.0308 -0.00564
2        A -0.0251  0.00000
3        A  0.0559  0.08102
4        B -0.4985 -0.69084
5        B  0.3062  0.11392
6        B  0.1923  0.00000
7        C -0.2894 -0.31495
8        C  0.0255  0.00000
9        C  0.2639  0.23838

The answers you've received are more than adequate, but for completeness I'd like to add a solution that explains how to get your desired result starting from your output from the by command. I'm going to use a slightly modified version of Ramnath's example for illustration:

mydata = data.frame(
  location = rep(LETTERS[1:3], each = 3),
  sample   = runif(9)
)

# DEFINE DUMMY FUNCTION - slightly different from ramnath's
myfunc = function(dat){
    temp <- data.frame(x = runif(3), y = rnorm(3))
    return(temp) 
}

You're splitting the data by location and applying your function using by:

rs <- by(mydata,mydata$location,FUN = myfunc)

mydata$location: A
          x           y
1 0.2730105 -0.06923224
2 0.9354096 -0.18336131
3 0.6359926 -0.04054326
----------------------------------------------------------- 
mydata$location: B
          x           y
1 0.5621529 -0.26404739
2 0.8098687  0.07912883
3 0.7334650  0.38287794
----------------------------------------------------------- 
mydata$location: C
          x          y
1 0.8443924 -0.9055125
2 0.7922256  0.1757586
3 0.4923929 -0.1931579

Now, a very handy thing to know is that we can put everything back together again using do.call and rbind:

result <- do.call(rbind,rs)

            x           y
A.1 0.2730105 -0.06923224
A.2 0.9354096 -0.18336131
A.3 0.6359926 -0.04054326
B.1 0.5621529 -0.26404739
B.2 0.8098687  0.07912883
B.3 0.7334650  0.38287794
C.1 0.8443924 -0.90551251
C.2 0.7922256  0.17575858
C.3 0.4923929 -0.19315789

But wait, you say! What about adding my location column? Well, notice what do.call(rbind,rs) did to the row names of your result! We can add the location column by just extracting the first character from the row names:

result$location <- substr(row.names(result),1,1)

This assumes, of course, that your locations are coded using a single character. But in general, the resulting row names should be in the form location.x, so you could always strsplit or regular expressions to extract the location names.

Finally, you can always simply modify the function you apply to each piece to add the location name as a column before returning the result, like so:

#Output not shown
myfunc1 = function(dat){
    temp <- data.frame(x = runif(3), y = rnorm(3))
    temp$location <- dat$location[1]
    return(temp) 
}
rs1 <- by(mydata,mydata$location,FUN = myfunc1)
result1 <- do.call(rbind,rs1)

So you'd just have to modify your ECCDFpts function in a similar manner.

Merge list of data.frames with list element name as factor in merged data frame

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？