开发者

How to convert time (mm:ss) to decimal form in R

I've imported a csv-file to R using RStudio where I am trying to plot points per game against minutes per game. However the minutes per game is in the format mm:ss and I'm having a hard time finding how to convert it t开发者_如何学Pythono decimal form.

Please help!


Given that you start with a character vector, this is relatively easy :

minPerGame <- c("4:30","2:20","34:10")

sapply(strsplit(minPerGame,":"),
  function(x) {
    x <- as.numeric(x)
    x[1]+x[2]/60
    }
)

gives

[1]  4.500000  2.333333 34.166667

Make sure you checked that you used read.csv() with the option as.is=TRUE. Otherwise you'll have to convert using as.character().


Do you need to decimalise it? If you store the data in the correct format, for example as an object of class POSIXlt, one of R's date-time classes, R will handle the correct handling of the times in numeric fashion. Here is an example of what I mean:

First we create some dummy data for illustration purposes:

set.seed(1)
DF <- data.frame(Times = seq(as.POSIXlt("10:00", format = "%M:%S"), 
                             length = 100, by = 10),
                 Points = cumsum(rpois(100, lambda = 1)))
head(DF)

Ignore the fact that there are dates here, it is effectively ignored when we do the plot as all observations have the same date part. Next we plot this using R's formula interface:

plot(Points ~ Times, data = DF, type = "o")

Which produces this:

How to convert time (mm:ss) to decimal form in R


Some tuning of first solution:

minPerGame <- paste(sample(1:89,100000,T),sample(0:59,100000,T),sep=":")

f1 <- function(){
sapply(strsplit(minPerGame,":"),
 function(x) {
  x <- as.numeric(x)
  x[1]+x[2]/60
 }
)
}
#
f2<- function(){
 w <- matrix(c(1,1/60),ncol=1)
 as.vector(matrix(as.numeric(unlist(strsplit(minPerGame,":"))),ncol=2,byrow=TRUE)%*%w)
}

system.time(f1())
system.time(f2())

system.time(f1()) user system elapsed 0.88 0.00 0.86

system.time(f2()) user system elapsed 0.25 0.00 0.27


I had data with times like so:

  • 22:49:20+1100
  • 19:29:11+1000
  • 20:01:26+0930

And this seemed to work for me:

my_df <- my_df %>%
separate(col = eventTime, into = c("H", "M", "S"), sep = "\\:", remove = FALSE) %>% 
separate(col = S, into = c("S", "Z"), sep = "\\+", remove = TRUE) %>% 
separate(col = Z, into = c("ZH", "ZM"), sep = 2, remove = TRUE) %>% 
mutate(H = as.numeric(H)/24) %>% 
mutate(M = as.numeric(M)/24/60) %>% 
mutate(S = as.numeric(S)/24/60/60) %>% 
mutate(ZH = as.numeric(ZH)/24) %>% 
mutate(ZM = as.numeric(ZM)/24/60) %>% 
mutate(H = H-ZH) %>% 
mutate(M = M-ZM) %>% 
mutate(time_num = H+M+S)

H:hours, M:minutes, S:seconds, Z:zone, ZH:zone hours, ZM:zone minutes

If you don't care about the timezones then this:

my_df <- my_df %>%
separate(col = eventTime, into = c("H", "M", "S"), sep = "\\:", remove = FALSE) %>% 
separate(col = S, into = c("S", "Z"), sep = "\\+", remove = TRUE) %>% 
mutate(H = as.numeric(H)/24) %>% 
mutate(M = as.numeric(M)/24/60) %>% 
mutate(S = as.numeric(S)/24/60/60) %>% 
mutate(time_num = H+M+S)

The first method you may end up with negatives. The second method you should get values between 0 and 1 with the time_num being the portion of the day.

For example:

  • 22:49:20+1100 = 0.950925926

  • 07:26:10+1100 = 0.309837963

It should be noted my time data was all from a timezone with a positive +

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜