R Reshape function is transforming integers into characters
Hi I am trying to use the reshape function for the first time. I have a data.frame with lots of information and daily rainfall. I am trying to put it in a long format so that I will have one row per daily rainfall. However when I use the reshape function my rainfall is transformed into characters... Here is a bit of my data (it actually goes all the way to P31, P is the rainfall per day)
code year month station ALTITUD NOM_PROV LONGITUD LATITUD P1 P2 P3 P4
2011 1932 7 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 0 0 0 54
2011 1932 8 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 0 0 0 0
and my code is:
CET <- read.table("H:/METEO_data/AEMET_2/2011.csv", sep=",", header=F)
colnames(CET)<-c("code","year","month","station","ALTITUD","NOM_PROV","LONGITUD","LATITUD","P1","P2","P3","P4","P5","P6","P7","P8","P9","P10","P11","P12","P13","P14","P15","P16","P17","P18","P19","P20","P21","P22","P23","P24","P25","P26","P27","P28","P29","P30","P31")
aa<- reshape(CET, timevar="day", varying = list(c("P1","P2","P3","P4","P5","P6","P7","P8","P9","P10","P11","P12","P13","P14",
"P15","P16","P17","P18","P19","P20","P21","P22","P23","P24","P25","P26","P27","P28","P29","P30","P31")),direction="long")
The end result is the data in the shape I wanted:
code year month station ALTITUD NOM_PROV LONGITUD LATITUD NA day P1 id
1.1 2011 1932 7 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 NA 1 0 1
2.1 2011 1932 8 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 NA 1 0 2
3.1 2011 1932 9 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 NA 1 0 3
4.1 2011 1932 10 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 NA 1 0 4
but I can't use it because:
class(aa$P1)
[1] "character"
when in the original data.frame:
class(CET$P1)
[1] "integer"
Could anybody tell me开发者_运维百科 why?? Also why is there a column of NAs before "day"?
Cheers
I find the reshape
function in base R very difficult to use. It was designed for panel data, so the parameters are difficult to interpret for most general cases. (Your data is in panel data format, so you are lucky.)
Instead, I recommend to use the functions melt
and cast
in package reshape2
. melt
is used to reshape a data frame from wide to tall format, and cast
does the reverse, i.e. reshape from tall to wide format. Here is an example using the snippet of data you provide:
First, recreate the data:
x <- "code year month station ALTITUD NOM_PROV LONGITUD LATITUD P1 P2 P3 P4
2011 1932 7 'EMBALSE CUERDA DEL POZO' 1150 SORIA 242172 415235 0 0 0 54
2011 1932 8 'EMBALSE CUERDA DEL POZO' 1150 SORIA 242172 415235 0 0 0 0"
CET <- read.table(textConnection(x), header=TRUE, quote="'")
Now load the reshape2
package and use melt
. (Note the use of paste
to easily refer to all of the measurement variables, rather than crafting a long list by hand.)
library(reshape2)
mCET <- melt(CET, measure.vars=paste("P", 1:4, sep=""),
variable_name="day", value_name="rainfall")
The results:
mCET
code year month station ALTITUD NOM_PROV LONGITUD LATITUD day value
1 2011 1932 7 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P1 0
2 2011 1932 8 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P1 0
3 2011 1932 7 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P2 0
4 2011 1932 8 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P2 0
5 2011 1932 7 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P3 0
6 2011 1932 8 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P3 0
7 2011 1932 7 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P4 54
8 2011 1932 8 EMBALSE CUERDA DEL POZO 1150 SORIA 242172 415235 P4 0
str(mCET)
'data.frame': 8 obs. of 10 variables:
$ code : int 2011 2011 2011 2011 2011 2011 2011 2011
$ year : int 1932 1932 1932 1932 1932 1932 1932 1932
$ month : int 7 8 7 8 7 8 7 8
$ station : Factor w/ 1 level "EMBALSE CUERDA DEL POZO": 1 1 1 1 1 1 1 1
$ ALTITUD : int 1150 1150 1150 1150 1150 1150 1150 1150
$ NOM_PROV: Factor w/ 1 level "SORIA": 1 1 1 1 1 1 1 1
$ LONGITUD: int 242172 242172 242172 242172 242172 242172 242172 242172
$ LATITUD : int 415235 415235 415235 415235 415235 415235 415235 415235
$ day : Factor w/ 4 levels "P1","P2","P3",..: 1 1 2 2 3 3 4 4
$ value : int 0 0 0 0 0 0 54 0
PS. @Joris Mey suggested that there might be some underlying issues with your original data. If this is the case you may still have to fix this, either before or after using melt
.
You might have answered your own question. It is possible that upon reading the table in initially the P1 column is a factor, and possibly then converted to a character. In your initial call to read.table, you can do stringsAsFactors = FALSE to ensure that things which you think are numbers are.
jim
Try passing header=TRUE
in your call to read.table
. Looks like the first row gets included in the data, which is then cast as factors. Or pass skip=1
to discard the first row entirely.
精彩评论