开发者

How to create a new data frame with original data separated by ; and with different counts per category?

I have a table with the following format.

df1 <- data.frame (A=c("aaa", "bbb", "ccc", "ddd"),
                   B=c("111; 222", "333", "444; 555; 666; 777", "888; 999"))

    A                  B
1 aaa           111; 222
2 bbb                333
3 ccc 444; 555; 666; 777
4 ddd           888; 999

I want to have a dataframe like this:

aaa 111
aaa 222
bbb 333
ccc 444
ccc 555
ccc 666
ccc 777
ddd 888
ddd 999

I found a wonderful solution to convert a similar list to data开发者_如何学Cframe in previous Stack Overflow questions. However, it is difficult for me to convert it from a dataframe with multiple entries. How can I do this?


Here is a simple base R solution (explanation below):

spl <- with(df1, strsplit(as.charcter(B), split = "; ", fixed = TRUE))
lens <- sapply(spl, length)
out <- with(df1, data.frame(A = rep(A, lens), B = unlist(spl)))

Which gives us:

R> out
    A   B
1 aaa 111
2 aaa 222
3 bbb 333
4 ccc 444
5 ccc 555
6 ccc 666
7 ccc 777
8 ddd 888
9 ddd 999

What is the code doing? Line 1:

spl <- with(df1, strsplit(as.character(B), split = "; ", fixed = TRUE))

breaks apart each of the strings in B using "; " as the characters to split on. We use fixed = TRUE (as suggested by @Marek in the comments) to speed up the matching and splitting as in this case we do not need to match using a regular expression, we simply want to match on the stated string. This gives us a list with the various elements split out:

R> spl
[[1]]
[1] "111" "222"

[[2]]
[1] "333"

[[3]]
[1] "444" "555" "666" "777"

[[4]]
[1] "888" "999"

The next line simply counts how many elements there are in each component of the list spl

lens <- sapply(spl, length)

which gives us a vectors of lengths:

R> lens
[1] 2 1 4 2

The final line of the solution plugs the outputs from the two previous steps into a new data frame. The trick is to repeat each element of df1$A lens number of times; for which we use the rep() function. We also need to unwrap the list spl into a vector which we do with unlist():

out <- with(df1, data.frame(A = rep(A, lens), B = unlist(spl)))


Literally the same as step one in my answer to your previous question:

library(reshape)
x <- melt((strsplit(as.character(df1$B), "; ")))
x <- data.frame("A"=df1[x$L1,1],"B"=x$value)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜