开发者

Parsing date/time string with timezone abbreviated name in Python?

I'm trying to parse timestamp strin开发者_如何学编程gs like "Sat, 11/01/09 8:00PM EST" in Python, but I'm having trouble finding a solution that will handle the abbreviated timezone.

I'm using dateutil's parse() function, but it doesn't parse the timezone. Is there an easy way to do this?


dateutil's parser.parse() accepts as keyword argument tzinfos a dictionary of the kind {'EST': -5*3600} (that is, matching the zone name to GMT offset in seconds). So assuming we have that, we can do:

>>> import dateutil.parser as dp
>>> s = 'Sat, 11/01/09 8:00PM'
>>> for tz_code in ('PST','PDT','MST','MDT','CST','CDT','EST','EDT'):
>>>     dt = s+' '+tz_code
>>>     print dt, '=', dp.parse(dt, tzinfos=tzd)

Sat, 11/01/09 8:00PM PST = 2009-11-01 20:00:00-08:00
Sat, 11/01/09 8:00PM PDT = 2009-11-01 20:00:00-07:00
Sat, 11/01/09 8:00PM MST = 2009-11-01 20:00:00-07:00
Sat, 11/01/09 8:00PM MDT = 2009-11-01 20:00:00-06:00
Sat, 11/01/09 8:00PM CST = 2009-11-01 20:00:00-06:00
Sat, 11/01/09 8:00PM CDT = 2009-11-01 20:00:00-05:00
Sat, 11/01/09 8:00PM EST = 2009-11-01 20:00:00-05:00
Sat, 11/01/09 8:00PM EDT = 2009-11-01 20:00:00-04:00

Regarding the content of tzinfos, here is how i populated mine:

tz_str = '''-12 Y
-11 X NUT SST
-10 W CKT HAST HST TAHT TKT
-9 V AKST GAMT GIT HADT HNY
-8 U AKDT CIST HAY HNP PST PT
-7 T HAP HNR MST PDT
-6 S CST EAST GALT HAR HNC MDT
-5 R CDT COT EASST ECT EST ET HAC HNE PET
-4 Q AST BOT CLT COST EDT FKT GYT HAE HNA PYT
-3 P ADT ART BRT CLST FKST GFT HAA PMST PYST SRT UYT WGT
-2 O BRST FNT PMDT UYST WGST
-1 N AZOT CVT EGT
0 Z EGST GMT UTC WET WT
1 A CET DFT WAT WEDT WEST
2 B CAT CEDT CEST EET SAST WAST
3 C EAT EEDT EEST IDT MSK
4 D AMT AZT GET GST KUYT MSD MUT RET SAMT SCT
5 E AMST AQTT AZST HMT MAWT MVT PKT TFT TJT TMT UZT YEKT
6 F ALMT BIOT BTT IOT KGT NOVT OMST YEKST
7 G CXT DAVT HOVT ICT KRAT NOVST OMSST THA WIB
8 H ACT AWST BDT BNT CAST HKT IRKT KRAST MYT PHT SGT ULAT WITA WST
9 I AWDT IRKST JST KST PWT TLT WDT WIT YAKT
10 K AEST ChST PGT VLAT YAKST YAPT
11 L AEDT LHDT MAGT NCT PONT SBT VLAST VUT
12 M ANAST ANAT FJT GILT MAGST MHT NZST PETST PETT TVT WFT
13 FJST NZDT
11.5 NFT
10.5 ACDT LHST
9.5 ACST
6.5 CCT MMT
5.75 NPT
5.5 SLT
4.5 AFT IRDT
3.5 IRST
-2.5 HAT NDT
-3.5 HNT NST NT
-4.5 HLV VET
-9.5 MART MIT'''

tzd = {}
for tz_descr in map(str.split, tz_str.split('\n')):
    tz_offset = int(float(tz_descr[0]) * 3600)
    for tz_code in tz_descr[1:]:
        tzd[tz_code] = tz_offset

ps. per @Hank Gay time zone naming is not clearly defined. To form my table i used http://www.timeanddate.com/library/abbreviations/timezones/ and http://en.wikipedia.org/wiki/List_of_time_zone_abbreviations . I looked at each conflict and resolved conflicts between obscure and popular names towards the popular (more used ones). There was one - IST - that was not as clear cut (it can mean Indian Standard Time, Iran Standard Time, Irish Standard Time or Israel Standard Time), so i left it out of the table - you may need to chose what to add for it based on your location. Oh - and I left out the Republic of Kiribati with their absurd "look at me i am first to celebrate New Year" GMT+13 and GMT+14 time zones.


That probably won't work because those abbreviations aren't unique. See this page for details. You might wind up just having to manually handle it yourself if you're working with a known set of inputs.


You might try pytz module: http://pytz.sourceforge.net/

pytz brings the Olson tz database into Python. This library allows accurate and cross platform timezone calculations using Python 2.3 or higher. It also solves the issue of ambiguous times at the end of daylight savings, which you can read more about in the Python Library Reference (datetime.tzinfo).

Amost all of the Olson timezones are supported.


The parse() function in dateutil can't handle time zones. The thing I've been using is the %Z formatter and the time.strptime() function. I have no idea how it deals with the ambiguity in time zones, but it seems to tell the difference between CDT and CST, which is all I needed.

Background: I store backup images in directories whose names are timestamps using local time, since I don't have GMT clocks handy at home. So I use time.strptime(d, r"%Y-%m-%dT%H:%M:%S_%Z") to parse the directory names back into an actual time for age analysis.


I realized that dateparser can solve this problem. https://pypi.org/project/dateparser/

Usage:

import dateparser


def time_gmt_format(str_datetime):
    # from string like "29/05/2020, 08:18 WIB" to GMT yyyymmddhhmmss

    date_time_obj = dateparser.parse(str_datetime, date_formats=['%d/%m/%Y, %H:%M %Z'], 
    settings={'TO_TIMEZONE': 'GMT'})  # convert to GMT datetime object

    return date_time_obj.strftime('%Y%m%d%H%M%S')  # Output: 20200529011800

Other timezones supported by this library: https://github.com/scrapinghub/dateparser/blob/e11a18a4d183a14211b28f5927ce01b220335881/dateparser/timezones.py


I used pytz to generate a TZINFOS mapping:

from datetime import datetime as dt

import pytz

from dateutil.tz import gettz
from pytz import utc
from dateutil import parser


def gen_tzinfos():
    for zone in pytz.common_timezones:
        try:
            tzdate = pytz.timezone(zone).localize(dt.utcnow(), is_dst=None)
        except pytz.NonExistentTimeError:
            pass
        else:
            tzinfo = gettz(zone)

            if tzinfo:
                yield tzdate.tzname(), tzinfo

TZINFOS Usage

>>> TZINFOS = dict(gen_tzinfos())
>>> TZINFOS
{'+02': tzfile('/usr/share/zoneinfo/Antarctica/Troll'),
 '+03': tzfile('/usr/share/zoneinfo/Europe/Volgograd'),
 '+04': tzfile('Europe/Ulyanovsk'),
 '+05': tzfile('/usr/share/zoneinfo/Indian/Kerguelen'),              
...
 'WGST': tzfile('/usr/share/zoneinfo/America/Godthab'),
 'WIB': tzfile('/usr/share/zoneinfo/Asia/Pontianak'),
 'WIT': tzfile('/usr/share/zoneinfo/Asia/Jayapura'),
 'WITA': tzfile('/usr/share/zoneinfo/Asia/Makassar'),
 'WSDT': tzfile('/usr/share/zoneinfo/Pacific/Apia'),
 'XJT': tzfile('/usr/share/zoneinfo/Asia/Urumqi')}

parser Usage

>>> date_str = 'Sat, 11/01/09 8:00PM EST'
>>> tzdate = parser.parse(date_str, tzinfos=TZINFOS)
>>> tzdate.astimezone(utc)
datetime.datetime(2009, 11, 2, 1, 0, tzinfo=<UTC>)

The UTC conversion is needed since there are many timezones available for each abbreviation. Since TZINFOS is a dict, it only has the last timezone per abbreviation. And you may not get the one you were expecting pre conversion.

>>> tzdate
datetime.datetime(2009, 11, 1, 20, 0, tzinfo=tzfile('/usr/share/zoneinfo/America/Port-au-Prince'))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜