开发者

Unable to format months with as.Date

I'm missing something obvious with the "format" section of as.Date. Consider this example

d1 <- data.frame(d = c("1/Jan/1947", "1/Feb/1947", "1/Mar/1947"), d2 = c("Jan/1947", "Feb/1947", "Mar/1947"))

d1$date1 <- as.Date(x=d1$d, format="%d/%b/%Y")
d1$date2 <- as.Date(x=d1$d2, format="%b/%Y")

           d       d2      date1 date2
1 1/Jan/1947 Jan/1947 1947-01-01  <NA>
2 1/Feb/1947 Feb/1947 1947-02-01  <NA>
3 1/Mar开发者_如何学编程/1947 Mar/1947 1947-03-01  <NA>

so my question is really simple -- I don't understand why the date1 works but date2 doesn't.


The simplest answer is that a date is something which includes a day and if one is not specified, as.Date() gets confused. From the ?as.Date documentation:

If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as ‘NA’. Unfortunately some common implementations (such as ‘glibc’) are unreliable and guess at the intended meaning.

When you think about it, a term such as "Mar/1947" is not, strictly speaking, a date - it's just a combination of month and year. A date is a specific day in March 1947 (or any other month + year) - since you don't specify one, you don't have a date.


It is because d2 in your data.frame is a malformed date. It doesn't contain a day. To get round this, consider using the following:

d1$date2 <- as.Date(x=paste("1/",d1$d2, sep=""), format="%d/%b/%Y")
> d1
           d       d2      date1      date2
1 1/Jan/1947 Jan/1947 1947-01-01 1947-01-01
2 1/Feb/1947 Feb/1947 1947-02-01 1947-02-01
3 1/Mar/1947 Mar/1947 1947-03-01 1947-03-01


I don't know, but %b doesn't seem to work when it's the leading field.

The following all fail (give NA):

> as.Date("Jan/1947", format="%b/%Y")
> as.Date("Jan 1947", format="%b %Y")
> as.Date("jan1947", format="%b%Y")
> as.Date("Jan1947", format="%b%Y")

whereas when you precede %b with %d, it works:

> as.Date("1Jan1947", format="%d%b%Y")
> as.Date("29-Jan-1947", format="%d-%b-%Y")
> as.Date("08/Aug/1947", format="%d/%b/%Y")
> as.Date("22 Dec 1947", format="%d %b %Y")

Seems like neilfws has the answer about incompleteness. This would also explain why giving only the year gives:

> as.Date("1947", format="%Y")
[1] "1947-09-19"


As per the document,"Handling date-times in R" by Cole Beck, internally a date is saved as a single numeric value, which counts the number of days passed since a reference date, 1970-01-01. Example: 1970-01-31 will be saved internally as 30.

So, coming back to the problem, when a day (%d) is not mentioned in the given input date (i.e., an incomplete date), it cannot store the date internally, resulting in "Warning message: NAs introduced by coercion"

Source: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ColeBeck/datestimes.pdf

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜