What is an efficient way to map unique values of a vector to sequential integers?
I have a dataframe in R with a vector of non-sequential numbers (data$SiteID
) that i would like to map to a vector of sequential numbers (data$site
) to the unique values of data$SiteID
. Within each site, I would like to map data$TrtID
to 0
where data$TrtID == 'control'
or to the next sequential integer, for the other unique data$TrtID
's:
data <- d开发者_如何转开发ata.frame(SiteID = c(1,1,1,9,'108','108','15', '15'),
TrtID = c('N', 'control', 'N', 'control', 'P', 'control', 'N', 'P'))
data$site
should bec(1,1,1,2,3,3,4,4)
.data$trt
should bec(1,0,1,0,1,0,0,1)
.
Just treat them as factors:
as.numeric(factor(data$SiteID, levels = unique(data$SiteID)))
[1] 1 1 1 2 3 3 4 4
and for the Trt, since you want a 0-based value, subtract one.
as.numeric(factor(data$TrtID, levels = sort(unique(data$TrtID))))-1
[1] 1 0 1 0 2 0 1 2
Notice that the levels arguments are different - Trt sorts first, which is convinient since control is alphabetically before N or P. If you want a non-standard sorting, you can just explicitly specify the levels in the order you want them.
Use conversion of factors to integers:
transform(data, site=as.integer(SiteID), trt=as.integer(TrtID))
If the ordering is important, you can give specific orders to the levels:
transform(data,
site = as.integer(factor(SiteID, unique(SiteID))),
trt = as.integer(factor(TrtID, unique(c('control', as.character(TrtID))))) - 1L)
Modified version grouping trt factor by site:
transform(data,
site = as.integer(factor(site_id, unique(site_id))),
trt = unsplit(tapply(trt_id, site_id, function(x)
as.integer(factor(x))), site_id) - 1L)
精彩评论