How to create a new data frame with original data separated by ; and with different counts per category?
I have a table with the following format.
df1 <- data.frame (A=c("aaa", "bbb", "ccc", "ddd"),
B=c("111; 222", "333", "444; 555; 666; 777", "888; 999"))
A B
1 aaa 111; 222
2 bbb 333
3 ccc 444; 555; 666; 777
4 ddd 888; 999
I want to have a dataframe like this:
aaa 111
aaa 222
bbb 333
ccc 444
ccc 555
ccc 666
ccc 777
ddd 888
ddd 999
I found a wonderful solution to convert a similar list to data开发者_如何学Cframe in previous Stack Overflow questions. However, it is difficult for me to convert it from a dataframe with multiple entries. How can I do this?
Here is a simple base R solution (explanation below):
spl <- with(df1, strsplit(as.charcter(B), split = "; ", fixed = TRUE))
lens <- sapply(spl, length)
out <- with(df1, data.frame(A = rep(A, lens), B = unlist(spl)))
Which gives us:
R> out
A B
1 aaa 111
2 aaa 222
3 bbb 333
4 ccc 444
5 ccc 555
6 ccc 666
7 ccc 777
8 ddd 888
9 ddd 999
What is the code doing? Line 1:
spl <- with(df1, strsplit(as.character(B), split = "; ", fixed = TRUE))
breaks apart each of the strings in B
using "; "
as the characters to split on. We use fixed = TRUE
(as suggested by @Marek in the comments) to speed up the matching and splitting as in this case we do not need to match using a regular expression, we simply want to match on the stated string. This gives us a list with the various elements split out:
R> spl
[[1]]
[1] "111" "222"
[[2]]
[1] "333"
[[3]]
[1] "444" "555" "666" "777"
[[4]]
[1] "888" "999"
The next line simply counts how many elements there are in each component of the list spl
lens <- sapply(spl, length)
which gives us a vectors of lengths:
R> lens
[1] 2 1 4 2
The final line of the solution plugs the outputs from the two previous steps into a new data frame. The trick is to repeat each element of df1$A
lens
number of times; for which we use the rep()
function. We also need to unwrap the list spl
into a vector which we do with unlist()
:
out <- with(df1, data.frame(A = rep(A, lens), B = unlist(spl)))
Literally the same as step one in my answer to your previous question:
library(reshape)
x <- melt((strsplit(as.character(df1$B), "; ")))
x <- data.frame("A"=df1[x$L1,1],"B"=x$value)
精彩评论