开发者

What is an efficient way to map unique values of a vector to sequential integers?

I have a dataframe in R with a vector of non-sequential numbers (data$SiteID) that i would like to map to a vector of sequential numbers (data$site) to the unique values of data$SiteID. Within each site, I would like to map data$TrtID to 0 where data$TrtID == 'control' or to the next sequential integer, for the other unique data$TrtID's:

data <- d开发者_如何转开发ata.frame(SiteID = c(1,1,1,9,'108','108','15', '15'), 
                   TrtID = c('N', 'control', 'N', 'control', 'P', 'control', 'N', 'P'))
  1. data$site should be c(1,1,1,2,3,3,4,4).
  2. data$trt should be c(1,0,1,0,1,0,0,1).


Just treat them as factors:

as.numeric(factor(data$SiteID, levels = unique(data$SiteID)))
[1] 1 1 1 2 3 3 4 4

and for the Trt, since you want a 0-based value, subtract one.

as.numeric(factor(data$TrtID, levels = sort(unique(data$TrtID))))-1
[1] 1 0 1 0 2 0 1 2

Notice that the levels arguments are different - Trt sorts first, which is convinient since control is alphabetically before N or P. If you want a non-standard sorting, you can just explicitly specify the levels in the order you want them.


Use conversion of factors to integers:

transform(data, site=as.integer(SiteID), trt=as.integer(TrtID))

If the ordering is important, you can give specific orders to the levels:

transform(data,
  site = as.integer(factor(SiteID, unique(SiteID))),
  trt  = as.integer(factor(TrtID, unique(c('control', as.character(TrtID))))) - 1L)

Modified version grouping trt factor by site:

transform(data,
  site = as.integer(factor(site_id, unique(site_id))),
  trt  = unsplit(tapply(trt_id, site_id, function(x)
         as.integer(factor(x))), site_id) - 1L)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜