Parsing date from text using Ruby
I'm trying to figure out how to extract dates from unstructured text using Ruby.
For example, I'd like to parse the date out of this string "Applications started after 12:00 A.M. Midnight (EST) February 1, 2010 will not be considered."
Any suggestion开发者_StackOverflow中文版s?
Try Chronic (http://chronic.rubyforge.org/) it might be able to parse that otherwise you're going to have to use Date.strptime.
Assuming you just want dates and not datetimes:
require 'date'
string = "Applications started after 12:00 A.M. Midnight (EST) February 1, 2010 will not be considered."
r = /(January|February|March|April|May|June|July|August|September|October|November|December) (\d+{1,2}), (\d{4})/
if string[r]
date =Date.parse(string[r])
puts date
end
Also you can try a gem that can help find date in string.
Exapmle:
input = 'circa 1960 and full date 07 Jun 1941'
dates_from_string = DatesFromString.new
dates_from_string.get_structure(input)
#=> return
# [{:type=>:year, :value=>"1960", :distance=>4, :key_words=>[]},
# {:type=>:day, :value=>"07", :distance=>1, :key_words=>[]},
# {:type=>:month, :value=>"06", :distance=>1, :key_words=>[]},
# {:type=>:year, :value=>"1941", :distance=>0, :key_words=>[]}]
精彩评论