How do you store a string in MongoDB as a Date type using Ruby?
I have a string that I'm parsing out from log files that looks like the following:
"[22/May/2011:23:02:21 +0000]"
What's the best way (examples in Ruby would be most appreciated, as I'm using the Mongo开发者_运维百科 Ruby driver) to get that stashed into MongoDB as a native Date type?
require 'date' # this is just to get the ABBR_MONTHNAMES list
input = "[22/May/2011:23:02:21 +0000]"
# this regex captures the numbers and month name
pattern = %r{^\[(\d{2})/(\w+)/(\d{4}):(\d{2}):(\d{2}):(\d{2}) ([+-]\d{4})\]$}
match = input.match(pattern)
# MatchData can be splatted, which is very convenient
_, date, month_name, year, hour, minute, second, tz_offset = *match
# ABBR_MONTHNAMES contains "Jan", "Feb", etc.
month = Date::ABBR_MONTHNAMES.index(month_name)
# we need to insert a colon in the tz offset, because Time.new expects it
tz = tz_offset[0,3] + ':' + tz_offset[3,5]
# this is your time object, put it into Mongo and it will be saved as a Date
Time.new(year.to_i, month, date.to_i, hour.to_i, minute.to_i, second.to_i, tz)
A few things to note:
- I assumed that the month names are the same as in the
ABBR_MONTHNAMES
list, otherwise, just make your own list. - Never ever use
Date.parse
to parse dates it is incredibly slow, the same goes forDateTime.parse
,Time.parse
, which use the same implementation. - If you parse a lot of different date formats check out the home_run gem.
- If you do a lot of these (like you often do when parsing log files), consider not using a regex. Use
String#index
,#[]
and#split
to extract the parts you need.
If you want to do this as fast as possible, something like the following is probably more appropriate. It doesn't use regexes (which are useful, but not fast):
date = input[1, 2].to_i
month_name = input[4, 3]
month = Date::ABBR_MONTHNAMES.index(month_name)
year = input[8, 4].to_i
hour = input[13, 2].to_i
minute = input[16, 2].to_i
second = input[19, 2].to_i
tz_offset = input[22, 3].to_i * 60 * 60 + input[25, 2].to_i * 60
Time.new(year, month, date, hour, minute, second, tz_offset)
It takes advantage of the fact that all fields have fixed width (at least I assume they do). So all you need to do is extract the substrings. It also calculates the timezone offset as a number instead of a string.
精彩评论