开发者

python: c# binary datetime encoding

I need to extract financial price data from a binary file. This price data is normally extracted by a piece of C# code. The biggest problem I'm having is getting a meaningful datetime.

The binary data looks like this:

'\x14\x11\x00\x00{\x14\xaeG\xe1z(@\x9a\x99\x99\x99\x99\x99(@q=\n\xd7\xa3p(@\x9a\x99\x99\x99\x99\x99(@\xac\x00\x19\x00\x00\x00\x00\x00\x08\x01\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'

The C# code that extracts it correctly is:

StockID = reader.ReadInt32();
Open = reader.ReadDouble();
High = reader.ReadDouble();
Low = reader.ReadDouble();
Close = reader.ReadDouble();
Volume = reader.ReadInt64();
TotalTrades = reader.ReadInt32();
Timestamp = reader.ReadDateTime();

This is where I've gotten in python. I have a couple concerns about it.

In [1]: barlength = 56; barformat = 'i4dqiq'
In [2]: pricebar = f.read(barlength)
In [3]: pricebar
Out[3]: '\x95L\x00\x00)\\\x8f\xc2\xf5\xc8N@D\x1c\xeb\xe26\xcaN@\x7fj\xbct\x93\xb0N@\xd7\xa3p=\n\xb7N@\xf6\xdb\x02\x00\x00\x00\x00\x00J\x03\x00\x00\x00"\xd8\x18\xe0\xdc\xcc\x08'
In [4]: struct.unpack(barformat, pricebar)
Out[4]: 
(19605,                # stock id
 61.57,                # open
 61.579800000000006,   # high
 61.3795,              # low
 61.43,                # close
 187382,               # volume -- seems reasonable
 842,                  # TotalTrades -- seems reasonable
 634124502600000000L   # datetime -- no idea what this means!
)

I used python's built in struct module but have some concerns about it.

  1. I'm not sure what format characters corre开发者_如何学Pythonspond to Int32 vs Int64 in the C# code, though several different tries returned the same python tuple.

  2. I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)

  3. I can't make any sense of the date return field. This is actually my biggest problem.


As far as I know, .net timestamps are ticks since 0001-01-01T00:00:00Z where a tick is 100 nanoseconds. So:

>>> x = 634124502600000000
>>> secs = x / 10.0 ** 7
>>> secs
63412450260.0
>>> import datetime
>>> delta = datetime.timedelta(seconds=secs)
>>> delta
datetime.timedelta(733940, 34260)
>>> ts = datetime.datetime(1,1,1) + delta
>>> ts
datetime.datetime(2010, 6, 18, 9, 31)
>>>

The date part is 2010-06-18. Are you in a timezone that's 9.5 hours away from UTC? It would be rather useful in verifying this calculation if you were to supply TWO timestamp values together with the expected answers.

Addressing your concern """I'm concerned though since the output for some of the fields doesn't seem to be very sensitive to the format I specify: For example, the TotalTrades field returns the same amount if i specify it as either signed or unsigned int OR signed or unsigned long (l, L, i, or I)""": They are not sensitive because (1) "long" and "int" mean the same (32 bits) and (2) the smaller half of all possible unsigned numbers have the same representation as signed numbers. For example, in 8-bit numbers, the numbers 0 to 127 inclusive have the same bit pattern whether signed or unsigned.


Without seeing the C# source containing the ReadInt32, ReadDouble, ReadDateTime etc methods it will be impossible to give a definitive answer, but...

  1. I'm not really sure what the difference is between the i and l format characters, but I think you're correct in using i/l for Int32 and q for Int64.

  2. Again, I don't know the difference between the i/l or I/L format characters, but since they all represent 32-bit integers then their binary representation should be the same for all values between 0 and 2147483647 inclusive. If it's possible for TotalTrades to be negative, or exceed 2147483647, then you should investigate further. If not then don't worry about it.

  3. It looks to me like your serialized date field is probably equivalent to DateTime.Ticks.

    If that's the case then the serialized value will be the number of ticks -- that is, the number of 100 nanosecond intervals -- since 00:00:00 on 1 January 0001.

    By that reckoning, the value shown in your question -- 634124502600000000 -- would represent 09:31:00 on 18 June 2010.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜