
Fastest way to get class vector from names in R

If I'm having the following vector in R (my levels obviously being A, B, and C)

c("A_1", "A_2", "B_1", "C_1", "C_2")

what is the most efficient way to transform it to class vector with numbers like

c(1, 1, 2, 3, 3)

I feel like this should be a one-liner (likely a combination of factor and grep) bu开发者_运维技巧t was unable to come up with one.


A simple solution would be:

x <- c("A_1", "A_2", "B_1", "C_1", "C_2")

x.out <- as.numeric(factor(substr(x, 0,1)))

If your data is more varied, let me know and we can work to make it a more robust solution.

There's a (more general) regular expression approach that would not require specifying the width of leading string:

Either delete anything incuding and after the underscore:

> as.numeric(factor(sub("_.+", "" , x)))
[1] 1 1 2 3 3

Or select the characters that precede the underscore (since in the R regex portions of the patterns enclosed in parens can be referred to in the replacement string by "\\" followed by a digit):

> as.numeric(factor(sub("(^.+)_.+$", "\\1" , x)))
[1] 1 1 2 3 3




验证码 换一张
取 消

