How to create a new data frame with original data separated by ; and with different counts per category?

2023-03-10 07:57 问答作者：

I have a table with the following format.

df1 <- data.frame (A=c("aaa", "bbb", "ccc", "ddd"),
                   B=c("111; 222", "333", "444; 555; 666; 777", "888; 999"))

    A                  B
1 aaa           111; 222
2 bbb                333
3 ccc 444; 555; 666; 777
4 ddd           888; 999

I want to have a dataframe like this:

aaa 111
aaa 222
bbb 333
ccc 444
ccc 555
ccc 666
ccc 777
ddd 888
ddd 999

I found a wonderful solution to convert a similar list to data开发者_如何学Cframe in previous Stack Overflow questions. However, it is difficult for me to convert it from a dataframe with multiple entries. How can I do this?

Here is a simple base R solution (explanation below):

spl <- with(df1, strsplit(as.charcter(B), split = "; ", fixed = TRUE))
lens <- sapply(spl, length)
out <- with(df1, data.frame(A = rep(A, lens), B = unlist(spl)))

Which gives us:

R> out
    A   B
1 aaa 111
2 aaa 222
3 bbb 333
4 ccc 444
5 ccc 555
6 ccc 666
7 ccc 777
8 ddd 888
9 ddd 999

What is the code doing? Line 1:

spl <- with(df1, strsplit(as.character(B), split = "; ", fixed = TRUE))

breaks apart each of the strings in B using "; " as the characters to split on. We use fixed = TRUE (as suggested by @Marek in the comments) to speed up the matching and splitting as in this case we do not need to match using a regular expression, we simply want to match on the stated string. This gives us a list with the various elements split out:

R> spl
[[1]]
[1] "111" "222"

[[2]]
[1] "333"

[[3]]
[1] "444" "555" "666" "777"

[[4]]
[1] "888" "999"

The next line simply counts how many elements there are in each component of the list spl

lens <- sapply(spl, length)

which gives us a vectors of lengths:

R> lens
[1] 2 1 4 2

The final line of the solution plugs the outputs from the two previous steps into a new data frame. The trick is to repeat each element of df1$A lens number of times; for which we use the rep() function. We also need to unwrap the list spl into a vector which we do with unlist():

out <- with(df1, data.frame(A = rep(A, lens), B = unlist(spl)))

Literally the same as step one in my answer to your previous question:

library(reshape)
x <- melt((strsplit(as.character(df1$B), "; ")))
x <- data.frame("A"=df1[x$L1,1],"B"=x$value)

继续阅读：dataframe

How to create a new data frame with original data separated by ; and with different counts per category?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？