apply strsplit to specific column in a data.frame

2023-04-12 11:34 问答作者：

I have a large dataframe with classification information. Here is an example:

> d <- data.frame(x = c(1,2,3,4), classification = c("cl1.scl1", "cl2", "cl3-bla", "cl4.subclass2"))
> d
  x classification
1 1       cl1.scl1
2 2            cl2
3 3        cl3-bla
4 4  cl4.subclass2

Before I do any further processing I need to aggregate the classification information, which means that I have to split the classification strings by "." and take the first token. This is the result I need:

> d
  x classification
1 1            cl1
2 2            cl2
3 3        cl3-bla
4 4            cl4

At the moment I am computing this as follows:

d$classification = unlist(lapply(d$classification, function (x) strsplit(as.ch开发者_JS百科aracter(x), ".", fixed=TRUE)[[1]][1]))

This works, but it took me quite a while to figure this out. I assume there is a more elegant solution, which I probably missed. Any suggestions? Thanks!

A slightly shorter solution is

sapply(strsplit(as.character(d$class), "\\."), `[`, 1)

You can use regular expressions with back-references.

gsub("(.*)\\.(.*)","\\1",d$classification)

There are 2 references (the portions of the regular expression in parenthesis), separated by a literal period. We replace whatever matches that pattern with the contents of the first reference.

Just delete the stuff that follows the "."

> sub("\\..+$", "", d$class)
[1] "cl1"     "cl2"     "cl3-bla" "cl4"  

d$classification <-  sub("\\..+$", "", d$classification)
 # I've never been very comfortable with partial name matching.

apply strsplit to specific column in a data.frame

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？