开发者

recoding using R

I have a data set with dam, sire, plus other variables but I need to recode my dam and sire id's. The dam column is sorted and each animal is only apprearing once. On the other hand, the sire column is unsorted and some animals are appearing more than once.

I would like to start my numbering of dams from 50,000 such that the first animal will get 50001, second animal 50002 and so on. I have this script that numbers each dam from 1 to N and wondering if it can be modified to begin from 50,000.

mydf$dam2 <- as.numeric(factor(paste(mydf$dam,sep=""))) 

*EDITED my data set is similar to this but more variables

dam <- c("1M521","1M584","1M790","1M871","1M888","1M933")
sire <- c("1X057","1T456","1W865","1W209","1W209","1W648")
wt <- c(369,300,332,351,303,314)
p2 <- c(NA,16,18,NA,NA,15)
mydf <- da开发者_开发知识库ta.frame(dam,sire,wt,p2)

For the sire column, I would like to start numbering from 10,000.

Any help would be very much appreciated.

Baz


At the moment, those sire and dam columns are factor variables, but in this case that means you can just add the as.numeric() results to you base number:

> mydf$dam_n <- 50000 +as.numeric(mydf$dam)
> mydf$sire_n <- 10000 +as.numeric(mydf$sire)
> mydf
    dam  sire  wt p2 dam_n sire_n
1 1M521 1X057 369 NA 50001  10005
2 1M584 1T456 300 16 50002  10001
3 1M790 1W865 332 18 50003  10004
4 1M871 1W209 351 NA 50004  10002
5 1M888 1W209 303 NA 50005  10002
6 1M933 1W648 314 15 50006  10003


Why not use:

names(mydf$dam2) <- 50000:whatEverYourLengthIs

I am not sure if I understood your datastructures completly but usually the names-function is used to set names.

EDIT:

You can use dimnames to names columns and rows. Like:

  [,1] [,2]
a    1    2
b    4    5
c    7    8

and

dimnames(mymatrix) <- list(c("Jan", "Feb", "Mar"), c("2005", "2006"))

yields

          2005     2006
Jan          1        2
Feb          4        5
Mar          7        8
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜