recoding using R
I have a data set with dam, sire, plus other variables but I need to recode my dam and sire id's. The dam column is sorted and each animal is only apprearing once. On the other hand, the sire column is unsorted and some animals are appearing more than once.
I would like to start my numbering of dams from 50,000 such that the first animal will get 50001, second animal 50002 and so on. I have this script that numbers each dam from 1 to N and wondering if it can be modified to begin from 50,000.
mydf$dam2 <- as.numeric(factor(paste(mydf$dam,sep="")))
*EDITED my data set is similar to this but more variables
dam <- c("1M521","1M584","1M790","1M871","1M888","1M933")
sire <- c("1X057","1T456","1W865","1W209","1W209","1W648")
wt <- c(369,300,332,351,303,314)
p2 <- c(NA,16,18,NA,NA,15)
mydf <- da开发者_开发知识库ta.frame(dam,sire,wt,p2)
For the sire column, I would like to start numbering from 10,000.
Any help would be very much appreciated.
Baz
At the moment, those sire and dam columns are factor variables, but in this case that means you can just add the as.numeric() results to you base number:
> mydf$dam_n <- 50000 +as.numeric(mydf$dam)
> mydf$sire_n <- 10000 +as.numeric(mydf$sire)
> mydf
dam sire wt p2 dam_n sire_n
1 1M521 1X057 369 NA 50001 10005
2 1M584 1T456 300 16 50002 10001
3 1M790 1W865 332 18 50003 10004
4 1M871 1W209 351 NA 50004 10002
5 1M888 1W209 303 NA 50005 10002
6 1M933 1W648 314 15 50006 10003
Why not use:
names(mydf$dam2) <- 50000:whatEverYourLengthIs
I am not sure if I understood your datastructures completly but usually the names-function is used to set names.
EDIT:
You can use dimnames to names columns and rows. Like:
[,1] [,2]
a 1 2
b 4 5
c 7 8
and
dimnames(mymatrix) <- list(c("Jan", "Feb", "Mar"), c("2005", "2006"))
yields
2005 2006
Jan 1 2
Feb 4 5
Mar 7 8
精彩评论