Binning a numeric variable
I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.
I'm using:
x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))
but this feels klugey to me. Does anyone know a bett开发者_高级运维er solution or a different approach?
How about cut
:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
You question is inconsistent.
In description 10
belongs to "10+" group, but in code 10
is separated level.
If 10
should be in the "10+" group then you code should be
as.factor(ifelse(x >= 10,"10+",x))
In this case you could truncate data to 10 (if you don't want a factor):
pmin(x, 10)
# [1] 0 1 3 4 2 4 2 5 10 10 10 2 10 2 10 3 4 2 10
x[x>=10]<-"10+"
This will give you a vector of strings. You can use as.numeric(x)
to convert back to numbers ("10+" become NA
), or as.factor(x)
to get your result above.
Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.
精彩评论