开发者

R Reshape function is transforming integers into characters

Hi I am trying to use the reshape function for the first time. I have a data.frame with lots of information and daily rainfall. I am trying to put it in a long format so that I will have one row per daily rainfall. However when I use the reshape function my rainfall is transformed into characters... Here is a bit of my data (it actually goes all the way to P31, P is the rainfall per day)

code year month station ALTITUD NOM_PROV LONGITUD LATITUD P1 P2  P3  P4    
2011 1932     7 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  0  0   0  54   
2011 1932     8 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  0  0   0   0   

and my code is:

CET <- read.table("H:/METEO_data/AEMET_2/2011.csv", sep=",", header=F)

colnames(CET)<-c("code","year","month","station","ALTITUD","NOM_PROV","LONGITUD","LATITUD","P1","P2","P3","P4","P5","P6","P7","P8","P9","P10","P11","P12","P13","P14","P15","P16","P17","P18","P19","P20","P21","P22","P23","P24","P25","P26","P27","P28","P29","P30","P31")

aa<- reshape(CET, timevar="day", varying = list(c("P1","P2","P3","P4","P5","P6","P7","P8","P9","P10","P11","P12","P13","P14",
"P15","P16","P17","P18","P19","P20","P21","P22","P23","P24","P25","P26","P27","P28","P29","P30","P31")),direction="long")   

The end result is the data in the shape I wanted:

   code year month                 station ALTITUD NOM_PROV LONGITUD LATITUD NA day P1 id  
1.1 2011 1932     7 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235 NA   1  0  1  
2.1 2011 1932     8 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235 NA   1  0  2  
3.1 2011 1932     9 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235 NA   1  0  3  
4.1 2011 1932    10 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235 NA   1  0  4  

but I can't use it because:

class(aa$P1)  
[1] "character"  

when in the original data.frame:

class(CET$P1)  
[1] "integer"

Could anybody tell me开发者_运维百科 why?? Also why is there a column of NAs before "day"?

Cheers


I find the reshape function in base R very difficult to use. It was designed for panel data, so the parameters are difficult to interpret for most general cases. (Your data is in panel data format, so you are lucky.)

Instead, I recommend to use the functions melt and cast in package reshape2. melt is used to reshape a data frame from wide to tall format, and cast does the reverse, i.e. reshape from tall to wide format. Here is an example using the snippet of data you provide:

First, recreate the data:

x <- "code year month station ALTITUD NOM_PROV LONGITUD LATITUD P1 P2  P3  P4    
2011 1932     7 'EMBALSE CUERDA DEL POZO'    1150    SORIA   242172  415235  0  0   0  54   
2011 1932     8 'EMBALSE CUERDA DEL POZO'    1150    SORIA   242172  415235  0  0   0   0"

CET <- read.table(textConnection(x), header=TRUE, quote="'")

Now load the reshape2 package and use melt. (Note the use of paste to easily refer to all of the measurement variables, rather than crafting a long list by hand.)

library(reshape2)
mCET <- melt(CET, measure.vars=paste("P", 1:4, sep=""), 
  variable_name="day", value_name="rainfall")

The results:

mCET

  code year month                 station ALTITUD NOM_PROV LONGITUD LATITUD day value
1 2011 1932     7 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P1     0
2 2011 1932     8 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P1     0
3 2011 1932     7 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P2     0
4 2011 1932     8 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P2     0
5 2011 1932     7 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P3     0
6 2011 1932     8 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P3     0
7 2011 1932     7 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P4    54
8 2011 1932     8 EMBALSE CUERDA DEL POZO    1150    SORIA   242172  415235  P4     0



str(mCET)

'data.frame': 8 obs. of  10 variables:
    $ code    : int  2011 2011 2011 2011 2011 2011 2011 2011
$ year    : int  1932 1932 1932 1932 1932 1932 1932 1932
$ month   : int  7 8 7 8 7 8 7 8
$ station : Factor w/ 1 level "EMBALSE CUERDA DEL POZO": 1 1 1 1 1 1 1 1
$ ALTITUD : int  1150 1150 1150 1150 1150 1150 1150 1150
$ NOM_PROV: Factor w/ 1 level "SORIA": 1 1 1 1 1 1 1 1
$ LONGITUD: int  242172 242172 242172 242172 242172 242172 242172 242172
$ LATITUD : int  415235 415235 415235 415235 415235 415235 415235 415235
$ day     : Factor w/ 4 levels "P1","P2","P3",..: 1 1 2 2 3 3 4 4
$ value   : int  0 0 0 0 0 0 54 0

PS. @Joris Mey suggested that there might be some underlying issues with your original data. If this is the case you may still have to fix this, either before or after using melt.


You might have answered your own question. It is possible that upon reading the table in initially the P1 column is a factor, and possibly then converted to a character. In your initial call to read.table, you can do stringsAsFactors = FALSE to ensure that things which you think are numbers are.

jim


Try passing header=TRUEin your call to read.table. Looks like the first row gets included in the data, which is then cast as factors. Or pass skip=1 to discard the first row entirely.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜