开发者

Finding the amount of time difference between dates in python

Suppose I had 2 lists that looked something like this:

L1=['Smith, John, 2008,  12,  10,  Male', 'Bates, John,  2006,  1,  Male', 'Johnson, John,  2009,  1,  28,  Male', 'James,  John,  2008,  3,  Male']

L2=['Smith,  Joy, 2008,  12,  10,  Female', 'Smith,  Kevin,  2008,  12,  10,  Male', 'Smith,  Matt,  2008,  12,  10,  Male', 'Smith,  Carol,  2000,  12,  11,  Female', 'Smith,  Sue,  2000,  12,  11,  Female', 'Johnson,  Alex,  2008,  3,  Male', 'Johnson,  Emma,  2008,  3,  Female', 'James,  Peter,  2008,  3,  Male', 'James,  Chelsea,  2008,  3,  Female'] 

What I wanted to do with it was compare dates for each person in a family (same last name) to the 'John' in each of their families. The dates vary from including year, month and day, to just year and month, to just year. I want to find the difference between John's date and each of his family members' to the most specific point I can (if one date has all 3 parts and the other only has month and year, then only find the time difference in months and years). This is what I have tried so far, and it didn't work because it wasn't using the right names and dates (it only gave one sibling per John) and the way it counts the time between dates is confusing and wrong:

for line in L1:
    type=line.split(',')
    if len(type)>=1:
        family=type[0]
        if len(type)==6:
            yearA=type[2]
            monthA=type[3]
            dayA=type[4]
            sex=type[5]
            print '%s, John Published in %s, %s, %s, %s' %(family, yearA, monthA, dayA, sex)
        elif len(type)==5:
            yearA=type[2]
            monthA=type[3]
            sex=type[4]
            print '%s, John Published in %s, %s, %s' %(family, yearA, monthA, sex)
        elif len(type)==4:
            yearA=type[2]
            sex=type[3]
            print '%s, John Published in %s, %s' %(family, yearA, sex)
    for line in L2:
        if re.search(family, line):
            word=line.split(',')
            name=word[1]
            if len(word)==6:
                yearB=word[2]
          开发者_StackOverflow社区      monthB=word[3]
                dayB=word[4]
                sex=word[5]
            elif len(word)==5:
                yearB=word[2]
                monthB=word[3]
                sex=word[4]
            elif len(word)==4:
                yearB=word[2]
                sex=word[3]
    if dayA and dayB:
        yeardiff= int(yearA)-int(yearB)
        monthdiff=int(monthA)-int(monthB)
        daydiff=int(dayA)-int(dayB)
        print'%s, %s Published %s year(s), %s month(s), %s day(s) before/after John, %s' %(family, name, yeardiff, monthdiff, daydiff, sex)
    elif not dayA and not dayB  and monthA and monthB:
        yeardiff= int(yearA)-int(yearB)
        monthdiff=int(monthA)-int(monthB)
        print'%s, %s Published %s year(s), %s month(s), before/after John, %s' %(family, name, yeardiff, monthdiff, sex)
    elif not monthA and not monthB and yearA and yearB:
        yeardiff= int(yearA)-int(yearB)
        print'%s, %s Published %s year(s), before/after John, %s' %(family, name, yeardiff, sex)

I would like to end up with something that looks like this, and if possible, something that allows the program to distinguish between whether the siblings came before or after, and only print the months and days if they are present in both the dates being compared:

Smith, John Published in  2008,  12,  10,  Male 
Smith,  Joy Published _ year(s) _month(s) _day(s) before/after John, Female 
Smith,  Kevin Published _ year(s) _month(s) _day(s) before/after John,  Male
Smith,  Matt Published _ year(s) _month(s) _day(s) before/after John,  Male
Smith,  Carol Published _ year(s) _month(s) _day(s) before/after John,  Female
Smith,  Sue Published _ year(s) _month(s) _day(s) before/after John,  Female
Bates, John Published in  2006,  1,  Male
Johnson, John Published in  2009,  1,  28,  Male
Johnson,  Alex Published _ year(s) _month(s) _day(s) before/after John,  Male
Johnson,  Emma Published _ year(s) _month(s) _day(s) before/after John,  Female
James,  John Published in  2008,  3,  Male
James,  Peter Published _ year(s) _month(s) _day(s) before/after John,  Male
James,  Chelsea Published _ year(s) _month(s) _day(s) before/after John,  Female


As Joe Kington suggested, the dateutil module is useful for this. In particular, it can tell you the difference between two dates in terms of years, months and days. (Doing the calculation yourself would involve taking account of leap years, etc. Much better to use a well-tested module than to reinvent this wheel.)

This problem is amenable to classes.

Let's make a Person class to keep track of a person's name, gender, and publication date:

class Person(object):
    def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
        self.lastname=lastname
        self.firstname=firstname
        self.ymd=VagueDate(year,month,day)
        self.gender=gender

The publication dates have potentially missing data, so let's make a special class to handle missing date data:

class VagueDate(object):
    def __init__(self,year=None,month=None,day=None):
        self.year=year
        self.month=month
        self.day=day
    def __sub__(self,other):
        d1=self.asdate()
        d2=other.asdate()
        rd=relativedelta.relativedelta(d1,d2)
        years=rd.years
        months=rd.months if self.month and other.month else None
        days=rd.days if self.day and other.day else None
        return VagueDateDelta(years,months,days)

The datetime module defines datetime.datetime objects, and uses datetime.timedelta objects to represent differences between two datetime.datetime objects. Analogously, let's define a VagueDateDelta to represent the difference between two VagueDates:

class VagueDateDelta(object):
    def __init__(self,years=None,months=None,days=None):
        self.years=years
        self.months=months
        self.days=days
    def __str__(self):
        if self.days is not None and self.months is not None:
            return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
        elif self.months is not None:
            return '{s.years} years, {s.months} months'.format(s=self)
        else:
            return '{s.years} years'.format(s=self)

Now that we've built ourselves some handy tools, it's not hard to solve the problem.

The first step is to parse the list of strings and convert them into Person objects:

def parse_person(text):
    data=map(str.strip,text.split(','))
    lastname=data[0]
    firstname=data[1]
    gender=data[-1]
    ymd=map(int,data[2:-1])
    return Person(lastname,firstname,gender,*ymd)
johns=map(parse_person,L1)
peeps=map(parse_person,L2)

Next we reorganize peeps into a dict of family members:

family=collections.defaultdict(list)
for person in peeps:
    family[person.lastname].append(person)

Finally, you just loop through the johns and and the family members of each john, compare publication dates, and report the results.

The full script might look something like this:

import datetime as dt
import dateutil.relativedelta as relativedelta
import pprint
import collections

class VagueDateDelta(object):
    def __init__(self,years=None,months=None,days=None):
        self.years=years
        self.months=months
        self.days=days
    def __str__(self):
        if self.days is not None and self.months is not None:
            return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
        elif self.months is not None:
            return '{s.years} years, {s.months} months'.format(s=self)
        else:
            return '{s.years} years'.format(s=self)

class VagueDate(object):
    def __init__(self,year=None,month=None,day=None):
        self.year=year
        self.month=month
        self.day=day
    def __sub__(self,other):
        d1=self.asdate()
        d2=other.asdate()
        rd=relativedelta.relativedelta(d1,d2)
        years=rd.years
        months=rd.months if self.month and other.month else None
        days=rd.days if self.day and other.day else None
        return VagueDateDelta(years,months,days)
    def asdate(self):
        # You've got to make some kind of arbitrary decision when comparing
        # vague dates. Here I make the arbitrary decision that missing info
        # will be treated like 1s for the purpose of calculating differences.
        return dt.date(self.year,self.month or 1,self.day or 1)
    def __str__(self):
        if self.day is not None and self.month is not None:
            return '{s.year}, {s.month}, {s.day}'.format(s=self)
        elif self.month is not None:
            return '{s.year}, {s.month}'.format(s=self)
        else:
            return '{s.year}'.format(s=self)

class Person(object):
    def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
        self.lastname=lastname
        self.firstname=firstname
        self.ymd=VagueDate(year,month,day)
        self.gender=gender
    def age_diff(self,other):
        return self.ymd-other.ymd
    def __str__(self):
        fmt='{s.lastname}, {s.firstname} ({s.gender}) ({d.year},{d.month},{d.day})'
        return fmt.format(s=self,d=self.ymd)
    __repr__=__str__
    def __lt__(self,other):
        d1=self.ymd.asdate()
        d2=other.ymd.asdate()
        return d1<d2

def parse_person(text):
    data=map(str.strip,text.split(','))
    lastname=data[0]
    firstname=data[1]
    gender=data[-1]
    ymd=map(int,data[2:-1])
    return Person(lastname,firstname,gender,*ymd)

def main():
    L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male',
        'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']

    L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male',
        'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female',
        'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male',
        'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male',
        'James, Chelsea, 2008, 3, Female']

    johns=map(parse_person,L1)
    peeps=map(parse_person,L2)

    print(pprint.pformat(johns))
    print
    print(pprint.pformat(peeps))
    print

    family=collections.defaultdict(list)
    for person in peeps:
        family[person.lastname].append(person)

    # print(family)
    pub_fmt='{j.lastname}, {j.firstname} Published in {j.ymd}, {j.gender}'
    rel_fmt='  {r.lastname}, {r.firstname} Published {d} {ba} John, {r.gender}'
    for john in johns:
        print(pub_fmt.format(j=john))
        for relative in family[john.lastname]:
            diff=john.ymd-relative.ymd
            ba='before' if relative<john else 'after'
            print(rel_fmt.format(
                r=relative,
                d=diff,
                ba=ba,                
                ))

if __name__=='__main__':
    main()

yields

[Smith, John (Male) (2008,12,10),
 Bates, John (Male) (2006,1,None),
 Johnson, John (Male) (2009,1,28),
 James, John (Male) (2008,3,None)]

[Smith, Joy (Female) (2008,12,10),
 Smith, Kevin (Male) (2008,12,10),
 Smith, Matt (Male) (2008,12,10),
 Smith, Carol (Female) (2000,12,11),
 Smith, Sue (Female) (2000,12,11),
 Johnson, Alex (Male) (2008,3,None),
 Johnson, Emma (Female) (2008,3,None),
 James, Peter (Male) (2008,3,None),
 James, Chelsea (Female) (2008,3,None)]

Smith, John Published in 2008, 12, 10, Male
  Smith, Joy Published 0 years, 0 months, 0 days after John, Female
  Smith, Kevin Published 0 years, 0 months, 0 days after John, Male
  Smith, Matt Published 0 years, 0 months, 0 days after John, Male
  Smith, Carol Published 7 years, 11 months, 29 days before John, Female
  Smith, Sue Published 7 years, 11 months, 29 days before John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
  Johnson, Alex Published 0 years, 10 months before John, Male
  Johnson, Emma Published 0 years, 10 months before John, Female
James, John Published in 2008, 3, Male
  James, Peter Published 0 years, 0 months after John, Male
  James, Chelsea Published 0 years, 0 months after John, Female


As mentioned in comments (in @Matt's answer), you'll need at least "year,month,day" in order to use datetime.date and datetime.timedelta. From the sample data above, it looks like some entries may be missing "day" which makes it a lot trickier.

If you don't might using default values for months/days (say 1st January), then you can quite quickly convert those dates to datetime.date instances.

As a quick example:

johns = []
for s in L1:
    # NOTE: not the most robust parsing method. 
    v = [x.strip() for x in s.split(",")]
    data = {
        "gender": v[-1],
        "last_name": v[0],
        "first_name": v[1],
    }

    # build keyword args for datetime.date()
    v = v[2:-1] # remove parsed data
    kwargs = { "year": int(v.pop(0)), "month": 1, "day":1 }
    try:
        kwargs["month"] = int(v.pop(0))
        kwargs["day"] = int(v.pop(0))
    except:
        pass

    data["date"] = date(**kwargs)
    johns.append(data)

That gives you a list of dict containing names, gender, and date. You can do the same for L2 to work out the the date difference by deducting one date from another (which produces a timedelta object.

>>> a = date(2008, 12,12)
>>> b = date(2010, 1, 13)
>>> delta = b - a
>>> print delta.days
397
>>> print "%d years, %d days" % divmod(delta.days, 365)
1 years, 32 days

I intentionally left out month since it wouldn't be as simple as equating 30 days to a month. Arguably, assuming 365 days a year is just as inaccurate if you take into account leap years.

Update: showing timedelta in terms of years, months, days

If you need to show deltas in terms of years, months and days, doing a divmod on days returned by timedelta may not be accurate as that does not take into account leap years and different days in months. You would have to manually compare each year, month and then day of each date.

Here's my stab at such a function. (only lightly tested, so use with caution)

from datetime import timedelta
def my_time_delta(d1,d2):
    """
    Returns time delta as the following tuple:
        ("before|after|same", "years", "months", "days")
    """
    if d1 == d2:
        return ("same",0,0,0)

    # d1 before or after d2?
    if d1 > d2:
        ba = "after"
        d1,d2 = d2,d1 # swap so d2 > d1
    else:
        ba = "before"

    years  = d2.year - d1.year
    months = d2.month - d1.month
    days   = d2.day - d1.day

    # adjust for -ve days/months
    if days < 0:
        # get last day of month for month before d1
        pre_d1 = d1 - timedelta(days=d1.day)
        days = days + pre_d1.day
        months = months - 1

    if months < 0:
        months = months + 12
        years  = years - 1

    return (ba, years, months, days)

Example usage:

>>> my_time_delta(date(2003,12,1), date(2003,11,2))
('after', 0, 0, 30)
>>> my_time_delta(date(2003,12,1), date(2004,11,2))
('before', 0, 11, 1)
>>> my_time_delta(date(2003,2,1), date(1992,3,10))
('after', 10, 10, 20)
>>> p,y,m,d = my_time_delta(date(2003,2,1), date(1992,3,10))
>>> print "%d years, %d months, %d days %s" % (y,m,d,p)
10 years, 10 months, 20 days after


There might be existing modules for this type of thing, but I would first convert the dates into common units of time (i.e. days since January 1st, 19XX in your examples). Then you're able to easily compare them, subtract them, etc., and you can just convert them back to days as you see fit for display. This should be fairly easy if days is at specific as you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜