开发者

Fastest way to get class vector from names in R

If I'm having the following vector in R (my levels obviously being A, B, and C)

c("A_1", "A_2", "B_1", "C_1", "C_2")

what is the most efficient way to transform it to class vector with numbers like

c(1, 1, 2, 3, 3)

I feel like this should be a one-liner (likely a combination of factor and grep) bu开发者_运维技巧t was unable to come up with one.

Thanks!


A simple solution would be:

x <- c("A_1", "A_2", "B_1", "C_1", "C_2")


x.out <- as.numeric(factor(substr(x, 0,1)))

If your data is more varied, let me know and we can work to make it a more robust solution.


There's a (more general) regular expression approach that would not require specifying the width of leading string:

Either delete anything incuding and after the underscore:

> as.numeric(factor(sub("_.+", "" , x)))
[1] 1 1 2 3 3

Or select the characters that precede the underscore (since in the R regex portions of the patterns enclosed in parens can be referred to in the replacement string by "\\" followed by a digit):

> as.numeric(factor(sub("(^.+)_.+$", "\\1" , x)))
[1] 1 1 2 3 3
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜