开发者

How do I get year and month when day is invalid without fixing the day myself?

I have some data that looks a bit like this:

require(zoo)

X <- rbind(c(date='20111001', fmt='%Y%m%d'),
            c('20111031', '%Y%m%d'),
            c('201110', '%Y%m'),
            c('102011', '%m%Y'),
            c('31/10/2011', '%d/%m/%Y'),
            c('20111000', '%Y%m%d'))
print(X)

#      date       fmt     
# [1,] "20111001" "%Y%m%d"
# [2,] "20111031" "%Y%m%d"
# [3,] "201110"   "%Y%m"  
# [4,] "102011"   "%m%Y"  
# [5,] "31/10/2011" "%d/%m/%Y"
# [6,] "20111000" "%Y%m%d"

I only want the year and month. I don't need the day, so I'm not worried that the final day is invalid. R, unfortunately, is:

mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)

# $`20111001`
# [1] "Oct 2011"

# $`20111031`
# [1] "Oct 2011"

# $`201110`
# [1] "Oct 2011"

# $`102011`
# [1] "Oct 2011"

# $`31/10/2011`
# [1] "Oct 2011"

# $`20111000`
# Error in charToDate(x) : 
#   character string is not in a standard unambiguous format

I know that the usual answer is to fix the day part of the date, e.g. using paste(x, '01', sep=''). I don't think that will work here, because I don't know in advance what the date form开发者_StackOverflow中文版at will be, and therefore I cannot set the day without converting to some sort of date object first.


Assuming the month always follows the year and is always two characters in your date. Why not just extract the information with substr. Perhaps something like:

lapply(X[,'date'], 
  function(x) paste(month.abb[as.numeric(substr(x, 5, 6))], substr(x, 1, 4))
  )


You don't need to specify the day in your format if you don't need it. Read ?strptime carefully. The second paragraph in the Details section says:

Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.

So adjust your format and everything should work.

X <- rbind(c(date='20111001', fmt='%Y%m'),
           c('20111031', '%Y%m'),
           c('201110',   '%Y%m'),
           c('102011',   '%m%Y'),
           c('20111000', '%Y%m'))
mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)


Assuming that I'm always given a date (and never a time), and that any illegal 'day' is less than 61, I can guarantee a legal date as follows, by treating the supplied day as 'seconds' and replacing the supplied day with the 1st.

require(stringr)

safe_date <- str_c('01', X[, 'date'])
safe_fmt <- str_c('%d', str_replace(X[, 'fmt'], '%d', '%S'))

mapply(as.yearmon, safe_date, safe_fmt, SIMPLIFY=FALSE)

# $`0120111001`
# [1] "Oct 2011"

# $`0120111031`
# [1] "Oct 2011"

# $`01201110`
# [1] "Oct 2011"

# $`01102011`
# [1] "Oct 2011"

# $`0131/10/2011`
# [1] "Oct 2011"

# $`0120111000`
# [1] "Oct 2011"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜