开发者

How to take the union of element in a nested list in R

I have a nested list in say lst(all the elements are of class int). I don't know the length of lst in advance; however I do know that each element of lst is a list of length say k

length(lst[[i]]) # this equals k and is known in advance, 
                 # this is true for i = 1 ... length(lst)

How do I take the union of the 1st element, 2nd element, ..., kth element of all the elements of lst

Specifically, if the length of lst is n, I want (not R code):

# I know that union can only be taken for 2 elements, 
# following开发者_开发百科 is for illustration purposes
listUnion1 <- union(lst[[1, 1]], lst[[2, 1]], ..., lst[[n, 1]])
listUnion2 <- union(lst[[1, 2]], lst[[2, 2]], ..., lst[[n, 2]])
.
.
.
listUnionk <- union(lst[[1, k]], lst[[2, k]], ..., lst[[n, k]])

Any help or pointers are greatly appreciated.

Here is a dataset that can be used, n = 3 and k = 2

list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")), 
    structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")), 
    structure(list(a = 12, b = 12), .Names = c("a", "b")))


Here is a general solution, similar in spirit to that of @Ramnath, but avoiding the use of union() which is a binary function. The trick is to note that union() is implemented as:

unique(c(as.vector(x), as.vector(y)))

and the bit inside unique() can be achieved by unlisting the nth component of each list.

The full solution then is:

unionFun <- function(n, obj) {
    unique(unlist(lapply(obj, `[[`, n)))
}
lapply(seq_along(lst[[1]]), FUN = unionFun, obj = lst)

which gives:

[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

[[2]]
 [1]  6  7  8  9 10 11  1  2  3  4  5 12

on the data you showed.

A couple of useful features of this are:

  • we use `[[` to subset obj in unionFun. This is similar to function(x) x$a in @Ramnath's Answer. However, we don't need an anonymous function (we use `[[` instead). The equivalent to @Ramnath's Answer is: lapply(lst, `[[`, 1)
  • to generalise the above, we replace the 1 above with n in unionFun(), and allow our list to be passed in as argument obj.

Now that we have a function that will provide the union of the nth elements of a given list, we can lapply() over the indices k, applying our unionFun() to each sub-element of lst, using the fact that the length of lst[[1]] is the same as length(lst[[k]]) for all k.

If it helps to have the names of the nth elements in the returned object, we can do:

> unions <- lapply(seq_along(lst[[1]]), FUN = unionFun, obj = lst)
> names(unions) <- names(lst[[1]])
> unions
$a
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

$b
 [1]  6  7  8  9 10 11  1  2  3  4  5 12


Here is one solution

# generate dummy data
x1 = sample(letters[1:5], 20, replace = T)
x2 = sample(letters[1:5], 20, replace = T)
df = data.frame(x1, x2, stringsAsFactors = F)

# find unique elements in each column
union_df = apply(df, 2, unique)

Let me know if this works

EDIT: Here is a solution for lists using the data you provided

mylist = list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")), 
              structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")), 
              structure(list(a = 12, b = 12), .Names = c("a", "b")))
list_a = lapply(mylist, function(x) x$a)
list_b = lapply(mylist, function(x) x$b)

union_a = Reduce(union, list_a)
union_b = Reduce(union, list_b)

If you have more than 2 elements in your list, we could generalize this code.


Here's another way: Use do.call/rbind to line up the lists by "name" into a data-frame, then apply unique/do.call to each column of this data-frame. ( I modified your data slightly so the 'a' and 'b' unions are of different lengths, to make sure it works correctly).

lst <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")), 
    structure(list(a = 6:10, b = 1:5), .Names = c("a", "b")), 
    structure(list(a = 12, b = 12), .Names = c("a", "b")))

> apply(do.call(rbind, lst),2, function( x ) unique( do.call( c, x)))
$a
 [1]  1  2  3  4  5  6  7  8  9 10 12

$b
 [1]  6  7  8  9 10 11  1  2  3  4  5 12


Your data

df <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")), 
           structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")), 
           structure(list(a = 12, b = 12), .Names = c("a", "b")))

This gives you the unique values of the nested lists:

library(plyr)
df.l <- llply(df, function(x) unlist(unique(x)))

R> df.l
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11

[[2]]
 [1]  6  7  8  9 10 11  1  2  3  4  5

[[3]]
[1] 12

EDIT

Thanks to Ramnath I changed the code a bit and hope this answer fits the needs of your question. For illustration I keep the previous answer as well. The slightly changed data has now an additional list.

df <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")), 
           structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")), 
           structure(list(a = 12, b = 12, c = 10:14), .Names = c("a", "b", "c")))


f.x <- function(x.list) {
  x.names <- names(x.list)
  i <- combn(x.names, 2)
  l <- apply(i, 2, function(y) x.list[y])
  llply(l, unlist)
}

Now you can apply the function to your data.

all.l <- llply(df, f.x)
llply(all.l, function(x) llply(x, unique))

R> [[1]]
[[1]][[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11


[[2]]
[[2]][[1]]
 [1]  6  7  8  9 10 11  1  2  3  4  5


[[3]]
[[3]][[1]]
[1] 12

[[3]][[2]]
[1] 12 10 11 13 14

[[3]][[3]]
[1] 12 10 11 13 14

However, the nested structure is not very user friendly. That could be changed a bit...


According to the documentation "unlist" is a recursive function, hence regardless of the nesting level of the lists supplied you can get all elements by passing them to unlist. You can get the union of the sublists as follows.

lst <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")), 
structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")), 
structure(list(a = 12, b = 12), .Names = c("a", "b")))

lapply(lst, function(sublst) unique(unlist(sublst)))

[[1]]
[1]  1  2  3  4  5  6  7  8  9 10 11

[[2]]
[1]  6  7  8  9 10 11  1  2  3  4  5

[[3]]
[1] 12
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜