python: c# binary datetime encoding
I need to extract financial price data from a binary file. This price data is normally extracted by a piece of C# code. The biggest problem I'm having is getting a meaningful datetime.
The binary data looks like this:
'\x14\x11\x00\x00{\x14\xaeG\xe1z(@\x9a\x99\x99\x99\x99\x99(@q=\n\xd7\xa3p(@\x9a\x99\x99\x99\x99\x99(@\xac\x00\x19\x00\x00\x00\x00\x00\x08\x01\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'
The C# code that extracts it correctly is:
StockID = reader.ReadInt32();
Open = reader.ReadDouble();
High = reader.ReadDouble();
Low = reader.ReadDouble();
Close = reader.ReadDouble();
Volume = reader.ReadInt64();
TotalTrades = reader.ReadInt32();
Timestamp = reader.ReadDateTime();
This is where I've gotten in python. I have a couple concerns about it.
In [1]: barlength = 56; barformat = 'i4dqiq'
In [2]: pricebar = f.read(barlength)
In [3]: pricebar
Out[3]: '\x95L\x00\x00)\\\x8f\xc2\xf5\xc8N@D\x1c\xeb\xe26\xcaN@\x7fj\xbct\x93\xb0N@\xd7\xa3p=\n\xb7N@\xf6\xdb\x02\x00\x00\x00\x00\x00J\x03\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'
In [4]: struct.unpack(barformat, pricebar)
Out[4]:
(19605, # stock id
61.57, # open
61.579800000000006, # high
61.3795, # low
61.43, # close
187382, # volume -- seems reasonable
842, # TotalTrades -- seems reasonable
634124502600000000L # datetime -- no idea what this means!
)
I used python's built in struct module but have some concerns about it.
I'm not sure what format characters corre开发者_如何学Pythonspond to Int32 vs Int64 in the C# code, though several different tries returned the same python tuple.
I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)
I can't make any sense of the date return field. This is actually my biggest problem.
As far as I know, .net timestamps are ticks since 0001-01-01T00:00:00Z where a tick is 100 nanoseconds. So:
>>> x = 634124502600000000
>>> secs = x / 10.0 ** 7
>>> secs
63412450260.0
>>> import datetime
>>> delta = datetime.timedelta(seconds=secs)
>>> delta
datetime.timedelta(733940, 34260)
>>> ts = datetime.datetime(1,1,1) + delta
>>> ts
datetime.datetime(2010, 6, 18, 9, 31)
>>>
The date part is 2010-06-18. Are you in a timezone that's 9.5 hours away from UTC? It would be rather useful in verifying this calculation if you were to supply TWO timestamp values together with the expected answers.
Addressing your concern """I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)""": They are not sensitive because (1) "long" and "int" mean the same (32 bits) and (2) the smaller half of all possible unsigned numbers have the same representation as signed numbers. For example, in 8-bit numbers, the numbers 0 to 127 inclusive have the same bit pattern whether signed or unsigned.
Without seeing the C# source containing the ReadInt32
, ReadDouble
, ReadDateTime
etc methods it will be impossible to give a definitive answer, but...
I'm not really sure what the difference is between the
i
andl
format characters, but I think you're correct in usingi
/l
forInt32
andq
forInt64
.Again, I don't know the difference between the
i
/l
orI
/L
format characters, but since they all represent 32-bit integers then their binary representation should be the same for all values between0
and2147483647
inclusive. If it's possible forTotalTrades
to be negative, or exceed2147483647
, then you should investigate further. If not then don't worry about it.It looks to me like your serialized date field is probably equivalent to
DateTime.Ticks
.If that's the case then the serialized value will be the number of ticks -- that is, the number of 100 nanosecond intervals -- since
00:00:00
on1 January 0001
.By that reckoning, the value shown in your question --
634124502600000000
-- would represent09:31:00
on18 June 2010
.
精彩评论