How do I get year and month when day is invalid without fixing the day myself?
I have some data that looks a bit like this:
require(zoo)
X <- rbind(c(date='20111001', fmt='%Y%m%d'),
c('20111031', '%Y%m%d'),
c('201110', '%Y%m'),
c('102011', '%m%Y'),
c('31/10/2011', '%d/%m/%Y'),
c('20111000', '%Y%m%d'))
print(X)
# date fmt
# [1,] "20111001" "%Y%m%d"
# [2,] "20111031" "%Y%m%d"
# [3,] "201110" "%Y%m"
# [4,] "102011" "%m%Y"
# [5,] "31/10/2011" "%d/%m/%Y"
# [6,] "20111000" "%Y%m%d"
I only want the year and month. I don't need the day, so I'm not worried that the final day is invalid. R, unfortunately, is:
mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)
# $`20111001`
# [1] "Oct 2011"
# $`20111031`
# [1] "Oct 2011"
# $`201110`
# [1] "Oct 2011"
# $`102011`
# [1] "Oct 2011"
# $`31/10/2011`
# [1] "Oct 2011"
# $`20111000`
# Error in charToDate(x) :
# character string is not in a standard unambiguous format
I know that the usual answer is to fix the day part of the date, e.g. using paste(x, '01', sep='')
. I don't think that will work here, because I don't know in advance what the date form开发者_StackOverflow中文版at will be, and therefore I cannot set the day without converting to some sort of date object first.
Assuming the month always follows the year and is always two characters in your date
. Why not just extract the information with substr
. Perhaps something like:
lapply(X[,'date'],
function(x) paste(month.abb[as.numeric(substr(x, 5, 6))], substr(x, 1, 4))
)
You don't need to specify the day in your format if you don't need it. Read ?strptime
carefully. The second paragraph in the Details section says:
Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So adjust your format and everything should work.
X <- rbind(c(date='20111001', fmt='%Y%m'),
c('20111031', '%Y%m'),
c('201110', '%Y%m'),
c('102011', '%m%Y'),
c('20111000', '%Y%m'))
mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)
Assuming that I'm always given a date (and never a time), and that any illegal 'day' is less than 61, I can guarantee a legal date as follows, by treating the supplied day as 'seconds' and replacing the supplied day with the 1st.
require(stringr)
safe_date <- str_c('01', X[, 'date'])
safe_fmt <- str_c('%d', str_replace(X[, 'fmt'], '%d', '%S'))
mapply(as.yearmon, safe_date, safe_fmt, SIMPLIFY=FALSE)
# $`0120111001`
# [1] "Oct 2011"
# $`0120111031`
# [1] "Oct 2011"
# $`01201110`
# [1] "Oct 2011"
# $`01102011`
# [1] "Oct 2011"
# $`0131/10/2011`
# [1] "Oct 2011"
# $`0120111000`
# [1] "Oct 2011"
精彩评论