Reshape error - invalid factor
I am somewhat new to R and I have run into a point where I need some help. I figure the reshape package can accomplish what I need to do.
Here is the structure of the original data frame:
> str(bruins)
'data.frame': 10 obs. of 6 variables:
$ gameid : Factor w/ 1 level "20090049": 1 1 1 1 1 1 1 1 1 1
$ team : chr "NYI" "BOS" "NYI" "BOS" ..开发者_如何学运维.
$ home_ind: chr "V" "H" "V" "H" ...
$ period : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5
$ goals : int 0 0 3 0 0 3 0 0 3 3
$ shots : int 16 7 9 7 8 12 5 4 38 30
Here are the first few rows:
> head(bruins)
gameid team home_ind period goals shots
409 20090049 NYI V 1 0 16
410 20090049 BOS H 1 0 7
411 20090049 NYI V 2 3 9
412 20090049 BOS H 2 0 7
413 20090049 NYI V 3 0 8
414 20090049 BOS H 3 3 12
I am looking to create a new data frame that pivots on gameid and period, with the rest of the columns summarizing the data for each home_ind row (10 columns in all).
When I run the following code:
b.melt <- melt(bruins, id=c("gameid", "period"), na.rm=TRUE)
I get the following error:
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = c(0L, 0L, 3L, 0L, 0L, 3L, 0L, :
invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, ri, value = c(16L, 7L, 9L, 7L, 8L, 12L, :
invalid factor level, NAs generated
Any help will be very much appreciated!
Edit: This is what I am hoping to get the restructured data to look like
gameid period vis_team vis_goals vis_shots home_team home_goals home_shots
1 20090049 1 NYI 0 16 BOS 0 7
2 20090049 2 NYI 3 9 BOS 0 7
3 20090049 3 NYI 0 8 BOS 3 12
since after melting, all measure variables will be in the same column, they should be of same type. In your case, "team" are character, "goals" are numeric, so you got that error.
Now I see what you're trying to do, here's an approach using summarise
from plyr:
home <- summarise(subset(per, home_ind == "V"),
gameid = gameid, period = period,
vis_team = team, vis_goals = goals, vis_shots = shots)
away <- summarise(subset(per, home_ind == "H"),
gameid = gameid, period = period,
home_team = team, home_goals = goals, home_shots = shots)
join(home, away)
There are also a number of ways to do it using just base functions (e.g. by subsetting and then modifying names)
I think you'd be better off using ddply
from the plyr
package for this problem. You didn't say how you wanted to summarise the data, but check out the summarise
functions if you want to use a different summary function for each variable, or the colwise
function if you want to summarise all variables the same way.
Thanks for the help. I ended up going a different route and broke the problem into little pieces. I am sure this is quicker, more elegant way, but I got to where I needed to be and wanted to share the code in case this helps someone else.
## load libraries
library(sqldf)
## assume that the dataset is loaded
## restructure the data and merge together
sql.1 <- "SELECT gameid, period, team `vis_team`, goals `vis_goals`, shots `vis_shots`"
sql.2 <- "FROM per WHERE home_ind='V' GROUP BY gameid, period "
sql.cmd <- paste(sql.1, sql.2, sep="")
vis <- sqldf(sql.cmd)
sql.1 <- "SELECT gameid, period, team `home_team`, goals `home_goals`, shots `home_shots`"
sql.2 <- "FROM per WHERE home_ind='H' GROUP BY gameid, period "
sql.cmd <- paste(sql.1, sql.2, sep="")
home <- sqldf(sql.cmd)
my.dataset <- merge(vis, home)
精彩评论