开发者

How to convert Python datetime dates to decimal/float years

I am looking for a way to convert datetime objects to decimal(/float) year, including fractional part. Example:

>>> obj = SomeObjet()
>>> obj.DATE_OBS
datetime.datetime(2007, 4, 14, 11, 42, 50)

How do I convert datetime.datetime(2007, 4, 14, 11, 42, 50) to decimal years. By decimal format I mean the float value 2007.4523, where the fractional part is the number of seconds from the beginning of the year (2007-01-01 till 2007-04-14), divided by the total number of seconds in that year开发者_高级运维 (2007-01-01 till 2008-01-01).

(NOTE: in statistical modeling (e.g. for linear regression), this is called "time index")


from datetime import datetime as dt
import time

def toYearFraction(date):
    def sinceEpoch(date): # returns seconds since epoch
        return time.mktime(date.timetuple())
    s = sinceEpoch

    year = date.year
    startOfThisYear = dt(year=year, month=1, day=1)
    startOfNextYear = dt(year=year+1, month=1, day=1)

    yearElapsed = s(date) - s(startOfThisYear)
    yearDuration = s(startOfNextYear) - s(startOfThisYear)
    fraction = yearElapsed/yearDuration

    return date.year + fraction

Demo:

>>> toYearFraction(dt.today())
2011.47447514

This method is probably accurate to within the second (or the hour if daylight savings or other strange regional things are in effect). It also works correctly during leapyears. If you need drastic resolution (such as due to changes in the Earth's rotation) you are better off querying a net service.


This is a little simpler way than the other solutions:

import datetime
def year_fraction(date):
    start = datetime.date(date.year, 1, 1).toordinal()
    year_length = datetime.date(date.year+1, 1, 1).toordinal() - start
    return date.year + float(date.toordinal() - start) / year_length

>>> print year_fraction(datetime.datetime.today())
2016.32513661

Note that this calculates the fraction based on the start of the day, so December 31 will be 0.997, not 1.0.


After implementing the accepted solution, I had the revelation that this modern pandas version is identical, and much simpler:

dat['decimal_date']=dat.index.year+ (dat.index.dayofyear -1)/365

Must be used on a date-time index Pandas dataframe. Adding as this solution post comes up in the top of my google search for this issue.


Seems no one has mentioned this, but since the datetime.timedelta objects that result from subtracting datetime.datetime objects have a division method, you could use the simple function

from datetime import datetime
def datetime2year(dt): 
    year_part = dt - datetime(year=dt.year, month=1, day=1)
    year_length = (
        datetime(year=dt.year + 1, month=1, day=1)
        - datetime(year=dt.year, month=1, day=1)
    )
    return dt.year + year_part / year_length

where the division is between datetime.timedelta objects.


I'm assuming that you are using this to compare datetime values. To do that, please use the the timedelta objects instead of reiniventing the wheel.

Example:

>>> from datetime import timedelta
>>> from datetime import datetime as dt
>>> d = dt.now()
>>> year = timedelta(days=365)
>>> tomorrow = d + timedelta(days=1)
>>> tomorrow + year > d + year
True

If for some reason you truly need decimal years, datetime objects method strftime() can give you an integer representation of day of the year if asked for %j - if this is what you are looking for, see below for a simple sample (only on 1 day resolution):

>>> from datetime import datetime
>>> d = datetime(2007, 4, 14, 11, 42, 50)
>>> (float(d.strftime("%j"))-1) / 366 + float(d.strftime("%Y"))
2007.2814207650274


Short answer

The date to decimal year conversion is ambiguously defined beyond .002 years (~1 day) precision. For cases where high decimal accuracy isn't important, this will work:

# No library needed, one-liner that's probably good enough                                                                                                                  
def decyear4(year, month, day, h=0, m=0, s=0) :                                                                                                                             
    return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0 

If you need accuracy better than .005 years (~2 days), you should be using something else (e.g. seconds since epoch, or some such). If you are forced to (or just really, really want to do it this way) use decimal years, read on.

Long Answer

Contrary to some of the answers and comments previously posted, a 'decimal year' date/timestamp is not an unambiguously defined quantity. When you consider the idea of a decimal year, there are two properties that you probably expect to be true:

  1. Perfect interpolation between beginning of year and end of year:
    2020, Jan 1, 12:00:00am would correspond 2020.000
    2020, Dec 31 11:59:59.999... pm would correspond to 2020.999...

  2. Constant units (i.e. linear mapping):
    2020.03-2020.02 == 2021.03-2021.02

Unfortunately you can't satisfy both of these simultaneously, because the length of time of 1 year is different on leap years then non-leap years. The first requirement is what most previous answers are trying to fulfill. But in many (most?) cases where a decimal year might actually be used (e.g. where it will be used in a regression or model of some sort) then the second property is just as (if not more) important.

Here are some options. I did these in vectorized form for numpy, so some of them can be simplified a bit if numpy is not needed.

import numpy as np 
# Datetime based 
# Non-linear time mapping! (Bad for regressions, models, etc.
# e.g. 2020.2-2020.1 != 2021.2-2021.1) 
def decyear1(year, month, day, h=0, m=0, s=0) :
    import datetime
    year_seconds = (datetime.datetime(year,12,31,23,59,59,999999)-datetime.datetime(year,1,1,0,0,0)).total_seconds()
    second_of_year = (datetime.datetime(year,month,day,h,m,s) - datetime.datetime(year,1,1,0,0,0)).total_seconds()
    return year + second_of_year / year_seconds

# Basically the same as decyear1 but without datetime library
def decyear2(year, month, day, h=0, m=0, s=0) :
    leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
    day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
    year_seconds = ( (day_of_year[-1]+leapyr )*24*3600)
    extraday = np.r_[month>2].astype(int)*leapyr 
    second_of_year = (((( day_of_year[month-1]+extraday + day-1)*24 + h)*60+m)*60+s)
    return year + second_of_year / year_seconds   

# No library needed
# Linear mapping, some deviation from some conceptual expectations 
# e.g. 2019.0000 != exactly midnight, January 1, 2019
def decyear3(year, month, day, h=0, m=0, s=0) :
    refyear = 2015
    leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
    day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
    extraday = np.r_[month>2].astype(int)*leapyr 
    year_seconds = 31557600.0 # Weighted average of leap and non-leap years
    seconds_from_ref = ((year-refyear)*year_seconds + (((( day_of_year[month-1]+extraday + day-1)*24+h)*60 + m)*60 +s))
    return refyear + seconds_from_ref/year_seconds

# No library needed, one-liner that's probably good enough
def decyear4(year, month, day, h=0, m=0, s=0) :
    return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0

# Just for fun - empirically determined one-liner (e.g. with a linear fit)
def decyear5(year, month, day, h=0, m=0, s=0) :
    return -8.789580e-02 + year + 8.331180e-02*month + 2.737750e-03*day + 1.142047e-04*hr + 2.079919e-06*mn + -1.731524e-07*sec

#
# Code to compare conversions
#
N = 500000
year = np.random.randint(1600,2050,(N))
month = np.random.randint(1,12,(N))
day = np.random.randint(1,28,(N))
hr = np.random.randint(0,23,(N))
mn = np.random.randint(0,59,(N))
sec = np.random.randint(0,59,(N))
s = ('decyear1','decyear2','decyear3','decyear4','decyear5')
decyears = np.zeros((N,len(s)))
for f, i in zip( (np.vectorize(decyear1), decyear2, decyear3, decyear4, decyear5), range(len(s)) ) : 
    decyears[:,i] = f(year,month,day,hr,mn,sec)

avg, std, mx = np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64')
for i in range(len(s)) : 
    for j in range(len(s)) :
        avg[i,j] = np.abs(decyears[:,i]-decyears[:,j]).mean()*365*24
        std[i,j] = (decyears[:,i]-decyears[:,j]).std()*365*24
        mx[i,j] = np.abs(decyears[:,i]-decyears[:,j]).max()*365*24

import pandas as pd 
unit = " (hours, 1 hour ~= .0001 year)"
for a,b in zip((avg, std, mx),("Average difference"+unit, "Standard dev.", "Max difference")) :
    print(b+unit)
    print(pd.DataFrame(a, columns=s, index=s).round(3))
    print()

And hear is how they all compare on a pseudo-random collection of dates:

Average magnitude of difference (hours, 1 hour ~= .0001 year) 
          decyear1  decyear2  decyear3  decyear4  decyear5
decyear1     0.000     0.000     4.035    19.258    14.051
decyear2     0.000     0.000     4.035    19.258    14.051
decyear3     4.035     4.035     0.000    20.609    15.872
decyear4    19.258    19.258    20.609     0.000    16.631
decyear5    14.051    14.051    15.872    16.631     0.000

Standard dev of difference (hours, 1 hour ~= .0001 year)
          decyear1  decyear2  decyear3  decyear4  decyear5
decyear1     0.000     0.000     5.402    16.550    16.537
decyear2     0.000     0.000     5.402    16.550    16.537
decyear3     5.402     5.402     0.000    18.382    18.369
decyear4    16.550    16.550    18.382     0.000     0.673
decyear5    16.537    16.537    18.369     0.673     0.000

Max difference (hours, 1 hour ~= .0001 year)
          decyear1  decyear2  decyear3  decyear4  decyear5
decyear1     0.000     0.000    16.315    43.998    30.911
decyear2     0.000     0.000    16.315    43.998    30.911
decyear3    16.315    16.315     0.000    44.969    33.171
decyear4    43.998    43.998    44.969     0.000    18.166
decyear5    30.911    30.911    33.171    18.166     0.000

Note, that none of these is necessarily more 'correct' then the others. It depends on your definition and your use case. But decyear1 and decyear2 are probably what most people are thinking of, even though (as noted above) they are probably not the best version to use in cases where decimal years are likely to be used, because of the non-linearity problem. Although all versions are consistent with each other to within a hundredth of a year, so any one will do in many situations (such as my case, where I needed it as input to the World Magnetic Model 2020).

Gotchas:

Hopefully it's apparent now that precision to better than an hour is probably not really necessary, but if it is, then might need to compensate your data for timezones and daylight savings time. Edit: And don't forget about leap seconds if you need another 3 digits of precision after sorting out the hours.

Note on precision:

All of the variants given above are well behaved and reversible - meaning the mappings themselves have unlimited precision. Accuracy, on the other hand, assumes a particular standard. If, for example, you are given decimal years without explanation then the accuracy of the reverse mapping you do would only be guaranteed to within half a day or so.


It's possible to calculate decimal date by using Pandas's julian date and the following formulas.

In the case where your pandas dataframe has an index that is date-time:

JD=dat.index.to_julian_date() #create julian date
L= JD+68569
N= 4*L/146097
L= L-(146097*N+3)/4
I= 4000*(L+1)/1461001
L= L-1461*I/4+31
J= 80*L/2447
K= L-2447*J/80
L= J/11
J= J+2-12*L
decimal_date= 100*(N-49)+I+L

decimal_date is a series of your date (in the same TZ as the dataframe index) in form of something like 2007.123452.

Adapted from this post.


Ten years down the line, let me add my two cents, using the astropy library.

    import datetime
    from astropy.time import Time

    input_date =  datetime.datetime(2007, 4, 14, 11, 42, 50)
    astropy_time_object = Time(input_date,format='datetime')

    decimal_year = astropy_time_object.decimalyear

    print(decimal_year)
    #2007.2835289827499


If you want to include the minutes and seconds use this:

YearF=[(x.timetuple().tm_yday-1+x.timetuple().tm_hour/24+x.timetuple().tm_min/(60*24)+x.timetuple().tm_sec/(24*3600))/(365+((x.timetuple().tm_year//4)==(x.timetuple().tm_year/4)))+x.timetuple().tm_year for x in DateArray]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜