How to convert Python datetime dates to decimal/float years
I am looking for a way to convert datetime objects to decimal(/float) year, including fractional part. Example:
>>> obj = SomeObjet()
>>> obj.DATE_OBS
datetime.datetime(2007, 4, 14, 11, 42, 50)
How do I convert datetime.datetime(2007, 4, 14, 11, 42, 50)
to decimal years. By decimal format I mean the float value 2007.4523
, where the fractional part is the number of seconds from the beginning of the year (2007-01-01 till 2007-04-14), divided by the total number of seconds in that year开发者_高级运维 (2007-01-01 till 2008-01-01).
(NOTE: in statistical modeling (e.g. for linear regression), this is called "time index")
from datetime import datetime as dt
import time
def toYearFraction(date):
def sinceEpoch(date): # returns seconds since epoch
return time.mktime(date.timetuple())
s = sinceEpoch
year = date.year
startOfThisYear = dt(year=year, month=1, day=1)
startOfNextYear = dt(year=year+1, month=1, day=1)
yearElapsed = s(date) - s(startOfThisYear)
yearDuration = s(startOfNextYear) - s(startOfThisYear)
fraction = yearElapsed/yearDuration
return date.year + fraction
Demo:
>>> toYearFraction(dt.today())
2011.47447514
This method is probably accurate to within the second (or the hour if daylight savings or other strange regional things are in effect). It also works correctly during leapyears. If you need drastic resolution (such as due to changes in the Earth's rotation) you are better off querying a net service.
This is a little simpler way than the other solutions:
import datetime
def year_fraction(date):
start = datetime.date(date.year, 1, 1).toordinal()
year_length = datetime.date(date.year+1, 1, 1).toordinal() - start
return date.year + float(date.toordinal() - start) / year_length
>>> print year_fraction(datetime.datetime.today())
2016.32513661
Note that this calculates the fraction based on the start of the day, so December 31 will be 0.997, not 1.0.
After implementing the accepted solution, I had the revelation that this modern pandas version is identical, and much simpler:
dat['decimal_date']=dat.index.year+ (dat.index.dayofyear -1)/365
Must be used on a date-time index Pandas dataframe. Adding as this solution post comes up in the top of my google search for this issue.
Seems no one has mentioned this, but since the datetime.timedelta
objects that result from subtracting datetime.datetime
objects have a division method, you could use the simple function
from datetime import datetime
def datetime2year(dt):
year_part = dt - datetime(year=dt.year, month=1, day=1)
year_length = (
datetime(year=dt.year + 1, month=1, day=1)
- datetime(year=dt.year, month=1, day=1)
)
return dt.year + year_part / year_length
where the division is between datetime.timedelta
objects.
I'm assuming that you are using this to compare datetime values. To do that, please use the the timedelta objects instead of reiniventing the wheel.
Example:
>>> from datetime import timedelta
>>> from datetime import datetime as dt
>>> d = dt.now()
>>> year = timedelta(days=365)
>>> tomorrow = d + timedelta(days=1)
>>> tomorrow + year > d + year
True
If for some reason you truly need decimal years, datetime
objects method strftime()
can give you an integer representation of day of the year if asked for %j
- if this is what you are looking for, see below for a simple sample (only on 1 day resolution):
>>> from datetime import datetime
>>> d = datetime(2007, 4, 14, 11, 42, 50)
>>> (float(d.strftime("%j"))-1) / 366 + float(d.strftime("%Y"))
2007.2814207650274
Short answer
The date to decimal year conversion is ambiguously defined beyond .002 years (~1 day) precision. For cases where high decimal accuracy isn't important, this will work:
# No library needed, one-liner that's probably good enough
def decyear4(year, month, day, h=0, m=0, s=0) :
return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0
If you need accuracy better than .005 years (~2 days), you should be using something else (e.g. seconds since epoch, or some such). If you are forced to (or just really, really want to do it this way) use decimal years, read on.
Long Answer
Contrary to some of the answers and comments previously posted, a 'decimal year' date/timestamp is not an unambiguously defined quantity. When you consider the idea of a decimal year, there are two properties that you probably expect to be true:
Perfect interpolation between beginning of year and end of year:
2020, Jan 1, 12:00:00am would correspond 2020.000
2020, Dec 31 11:59:59.999... pm would correspond to 2020.999...Constant units (i.e. linear mapping):
2020.03-2020.02 == 2021.03-2021.02
Unfortunately you can't satisfy both of these simultaneously, because the length of time of 1 year is different on leap years then non-leap years. The first requirement is what most previous answers are trying to fulfill. But in many (most?) cases where a decimal year might actually be used (e.g. where it will be used in a regression or model of some sort) then the second property is just as (if not more) important.
Here are some options. I did these in vectorized form for numpy, so some of them can be simplified a bit if numpy is not needed.
import numpy as np
# Datetime based
# Non-linear time mapping! (Bad for regressions, models, etc.
# e.g. 2020.2-2020.1 != 2021.2-2021.1)
def decyear1(year, month, day, h=0, m=0, s=0) :
import datetime
year_seconds = (datetime.datetime(year,12,31,23,59,59,999999)-datetime.datetime(year,1,1,0,0,0)).total_seconds()
second_of_year = (datetime.datetime(year,month,day,h,m,s) - datetime.datetime(year,1,1,0,0,0)).total_seconds()
return year + second_of_year / year_seconds
# Basically the same as decyear1 but without datetime library
def decyear2(year, month, day, h=0, m=0, s=0) :
leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
year_seconds = ( (day_of_year[-1]+leapyr )*24*3600)
extraday = np.r_[month>2].astype(int)*leapyr
second_of_year = (((( day_of_year[month-1]+extraday + day-1)*24 + h)*60+m)*60+s)
return year + second_of_year / year_seconds
# No library needed
# Linear mapping, some deviation from some conceptual expectations
# e.g. 2019.0000 != exactly midnight, January 1, 2019
def decyear3(year, month, day, h=0, m=0, s=0) :
refyear = 2015
leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
extraday = np.r_[month>2].astype(int)*leapyr
year_seconds = 31557600.0 # Weighted average of leap and non-leap years
seconds_from_ref = ((year-refyear)*year_seconds + (((( day_of_year[month-1]+extraday + day-1)*24+h)*60 + m)*60 +s))
return refyear + seconds_from_ref/year_seconds
# No library needed, one-liner that's probably good enough
def decyear4(year, month, day, h=0, m=0, s=0) :
return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0
# Just for fun - empirically determined one-liner (e.g. with a linear fit)
def decyear5(year, month, day, h=0, m=0, s=0) :
return -8.789580e-02 + year + 8.331180e-02*month + 2.737750e-03*day + 1.142047e-04*hr + 2.079919e-06*mn + -1.731524e-07*sec
#
# Code to compare conversions
#
N = 500000
year = np.random.randint(1600,2050,(N))
month = np.random.randint(1,12,(N))
day = np.random.randint(1,28,(N))
hr = np.random.randint(0,23,(N))
mn = np.random.randint(0,59,(N))
sec = np.random.randint(0,59,(N))
s = ('decyear1','decyear2','decyear3','decyear4','decyear5')
decyears = np.zeros((N,len(s)))
for f, i in zip( (np.vectorize(decyear1), decyear2, decyear3, decyear4, decyear5), range(len(s)) ) :
decyears[:,i] = f(year,month,day,hr,mn,sec)
avg, std, mx = np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64')
for i in range(len(s)) :
for j in range(len(s)) :
avg[i,j] = np.abs(decyears[:,i]-decyears[:,j]).mean()*365*24
std[i,j] = (decyears[:,i]-decyears[:,j]).std()*365*24
mx[i,j] = np.abs(decyears[:,i]-decyears[:,j]).max()*365*24
import pandas as pd
unit = " (hours, 1 hour ~= .0001 year)"
for a,b in zip((avg, std, mx),("Average difference"+unit, "Standard dev.", "Max difference")) :
print(b+unit)
print(pd.DataFrame(a, columns=s, index=s).round(3))
print()
And hear is how they all compare on a pseudo-random collection of dates:
Average magnitude of difference (hours, 1 hour ~= .0001 year)
decyear1 decyear2 decyear3 decyear4 decyear5
decyear1 0.000 0.000 4.035 19.258 14.051
decyear2 0.000 0.000 4.035 19.258 14.051
decyear3 4.035 4.035 0.000 20.609 15.872
decyear4 19.258 19.258 20.609 0.000 16.631
decyear5 14.051 14.051 15.872 16.631 0.000
Standard dev of difference (hours, 1 hour ~= .0001 year)
decyear1 decyear2 decyear3 decyear4 decyear5
decyear1 0.000 0.000 5.402 16.550 16.537
decyear2 0.000 0.000 5.402 16.550 16.537
decyear3 5.402 5.402 0.000 18.382 18.369
decyear4 16.550 16.550 18.382 0.000 0.673
decyear5 16.537 16.537 18.369 0.673 0.000
Max difference (hours, 1 hour ~= .0001 year)
decyear1 decyear2 decyear3 decyear4 decyear5
decyear1 0.000 0.000 16.315 43.998 30.911
decyear2 0.000 0.000 16.315 43.998 30.911
decyear3 16.315 16.315 0.000 44.969 33.171
decyear4 43.998 43.998 44.969 0.000 18.166
decyear5 30.911 30.911 33.171 18.166 0.000
Note, that none of these is necessarily more 'correct' then the others. It depends on your definition and your use case. But decyear1
and decyear2
are probably what most people are thinking of, even though (as noted above) they are probably not the best version to use in cases where decimal years are likely to be used, because of the non-linearity problem. Although all versions are consistent with each other to within a hundredth of a year, so any one will do in many situations (such as my case, where I needed it as input to the World Magnetic Model 2020).
Gotchas:
Hopefully it's apparent now that precision to better than an hour is probably not really necessary, but if it is, then might need to compensate your data for timezones and daylight savings time. Edit: And don't forget about leap seconds if you need another 3 digits of precision after sorting out the hours.
Note on precision:
All of the variants given above are well behaved and reversible - meaning the mappings themselves have unlimited precision. Accuracy, on the other hand, assumes a particular standard. If, for example, you are given decimal years without explanation then the accuracy of the reverse mapping you do would only be guaranteed to within half a day or so.
It's possible to calculate decimal date by using Pandas's julian date and the following formulas.
In the case where your pandas dataframe has an index that is date-time:
JD=dat.index.to_julian_date() #create julian date
L= JD+68569
N= 4*L/146097
L= L-(146097*N+3)/4
I= 4000*(L+1)/1461001
L= L-1461*I/4+31
J= 80*L/2447
K= L-2447*J/80
L= J/11
J= J+2-12*L
decimal_date= 100*(N-49)+I+L
decimal_date is a series of your date (in the same TZ as the dataframe index) in form of something like 2007.123452.
Adapted from this post.
Ten years down the line, let me add my two cents, using the astropy library.
import datetime
from astropy.time import Time
input_date = datetime.datetime(2007, 4, 14, 11, 42, 50)
astropy_time_object = Time(input_date,format='datetime')
decimal_year = astropy_time_object.decimalyear
print(decimal_year)
#2007.2835289827499
If you want to include the minutes and seconds use this:
YearF=[(x.timetuple().tm_yday-1+x.timetuple().tm_hour/24+x.timetuple().tm_min/(60*24)+x.timetuple().tm_sec/(24*3600))/(365+((x.timetuple().tm_year//4)==(x.timetuple().tm_year/4)))+x.timetuple().tm_year for x in DateArray]
精彩评论