What's the best way to detect date formats in user submitted data?
I'm reading csv data uploaded by users in my 开发者_开发知识库Ruby on Rails app. When a user specifies that a particular column has dates(or times), I want to be able to automatically detect the format. This means it can be in American or British formats (any of dd/mm/yy, mm/dd/yy, yyyy-mm-dd, 12 Feb 2010, etc etc)
I have tried parsedate in Ruby but it doesn't work for both American and British dates, unless you specify the format. Is there any way to really do this properly, or am I asking for too much? I don't mind calling a script in another language just for this one task. I'm wondering how it's handled in programs like Excel and Google docs.
Unless the application has a locality I don't know how you can determine this accurately.
What you do know however is that:
- There are only 12 months.
- Only years can be 4 digits long.
- If it contains text then it must be the month.
You could write your own parser with these rules to work it out. It could however (without application locality) misinterpret 05/10/2010 as UK 5th Oct 2010 or US 10th May 2010.
there is little that a program can do to magically determine which type of short date format it is.
If you give a program a date like 09/06/08, it could mean either:
- 9th of June, 2008, or
- 6th of September, 2008, or perhaps even
- 8th of June, 2009.
When Ruby parses dates from string, it will use the default format providers to determine what format the date is in. See the Ruby DateTime class documentation for more info.
I think the best thing to do in your situation would be to try and arrange all of your records in to groups, where each group has one particular format of date. If you yourself can't manually determine the difference between the American and British dates by some criterion, unfortunately a program won't be able to either.
However... if each user is from a specific locale, and you can make the (rather large) assumption that every date they upload in a CSV conforms to their country's date format standards, you could make use of the internationalization API. It should be technically possible to grab that particular user's locale, and then load up the correct i18n data (with the appropriate date formatter), and parse the file using the formatter i18n provides you. Read the Rails Internationalization API guide to get an idea of how you can utilize the i18n API.
I know this is an old post but for archives' sakes I recommend using the Chronic gem for parsing dates/times in CSV imports.
Chronic.parse("8/15/2020") # => 2020-08-15 12:00:00 -0000
Chronic.parse("15/8/2020") # => 2020-08-15 12:00:00 -0000
Chronic.parse("8-15-2020") # => 2020-08-15 12:00:00 -0000
Chronic.parse("8-15-2020 3PM") # => 2020-08-15 15:00:00 -0000
FYI you'll want to tell Chronic to parse in the client's account timezone. Otherwise it will use the globally configured timezone (which is UTC in my example).
精彩评论