开发者

How to transform a dataframe of characters to the respective dates?

I noticed already a couple of times that working with dates doesn't allow for using the usual tricks in R. Say I have a dataframe Data with Dates (see below), and I want to convert the complete dataframe to a date class. The only solution I could come up with until now is :

for (i in 1:ncol(Data)){
    Data[,i] <- as.Date(Data[,i],format="%d %B %Y")
}

This gives a dataframe with the correct structure :

> str(Data)
'data.frame':   6 obs. of  4 variables:
 $ Rep1:Class 'Date'  num [1:6] 12898 12898 13907 13907 13907 ...
 $ Rep2:Class 'Date'  num [1:6] 13278 13278 14217 14217 14217 ...
 $ Rep3:Class 'Date'  num [1:6] 13600 13600 14340 14340 14340 ...
 $ Rep4:Class 'Date'  num [1:6] 13831 13831 14669 14669 14669 ...

Using a classic apply approach gives something completely different. Although all variables are of the same class and go to the same class, I can't get a data-frame or matrix of the correct class as output :

> str(sapply(Data,as.Date,format="%d %B %Y"))
 num [1:6, 1:4] 12898 12898 13907 13907 13907 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Rep1" "Rep2" "Rep3" "Rep4"
> str(apply(Data,2,as.Date,format="%d %B %Y"))
 num [1:6, 1:4] 12898 12898 13907 13907 13907 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Rep1" "Rep2" "Rep3" "Rep4"

If you want to transform these matrices again in Date objects, you need an origin. That origin can differ from system to system, so using as.Date or another function after the apply() doesn't help much either. If you apply the origin, you get a vector again.

Anybody a clean solution for this kind of data? Below is the dataframe I used in the examples.

Data <- structure(list(Rep1 = c(" 25 April 2005 ", " 25 April 2005 ", 
" 29 January 2008 ", " 29 January 2008 ", " 29 January 2008 ", 
" 29 January 2008 "), Rep2 = c(" 10 May 2006 ", " 10 May 2006 ", 
" 4 December 2008 ", " 4 December 2008 ", " 4 December 2008 ", 
" 4 December 2008 开发者_JS百科"), Rep3 = c(" 28 March 2007 ", " 28 March 2007 ", 
" 6 April 2009 ", " 6 April 2009 ", " 6 April 2009 ", " 6 April 2009 "
), Rep4 = c(" 14 November 2007 ", " 14 November 2007 ", " 1 March 2010 ", 
" 1 March 2010 ", " 1 March 2010 ", " 1 March 2010 ")), .Names = c("Rep1", 
"Rep2", "Rep3", "Rep4"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")


I think the most succinct way to do this is:

Data[] <- lapply(Data, as.Date,format="%d %B %Y")

This also nicely generalises to the case where not all columns are dates:

Data[date_col] <- lapply(Data[date_col], as.Date,format="%d %B %Y")

You can also simplify the date parsing with a couple of other packages

library(stringr)
library(lubridate)
Data[] <- lapply(Data, function(x) dmy(str_trim(x)))

which is a little more verbose, but has the advantage that you don't need to figure out the data format yourself.


How about

str(as.data.frame(lapply(Data,as.Date,format="%d %B %Y")))
# 'data.frame':   6 obs. of  4 variables:
#  $ Rep1:Class 'Date'  num [1:6] 12898 12898 13907 13907 13907 ...
#  $ Rep2:Class 'Date'  num [1:6] 13278 13278 14217 14217 14217 ...
#  $ Rep3:Class 'Date'  num [1:6] 13600 13600 14340 14340 14340 ...
#  $ Rep4:Class 'Date'  num [1:6] 13831 13831 14669 14669 14669 ...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜