Algorithm for fixed-point multiplication
I'm trying to rescale a timestamp (fractional part of seconds only) from nanoseconds (units of 10^-9 seconds) to the lower half of an NTP timestamp (units of 2^-32 seconds). Effectively this means multiplying by 4.2949673. But I need to do it without floating-point math, and without using integers larger than 32 bits (in fact, I'm actually writing this for an 8-bit microcontroller, so even 32-bit math is expensive, especially divisions).
I've come up with a few algorithms that work reasonably well, but I don't have any real grounding in numerical methods so I'd appreciate any suggestions as to how to improve them, or any other algorithms that would be more accurate and/or faster.
Algorithm 1
uint32_t intts = (ns >> 16) * 281474 + (ns << 16) / 15259 + ns / 67078;
The first two constants were chosen to slightly undershoot, rather than overshoot, the correct figure, and the final factor o开发者_如何学JAVAf 67078 was determined empirically to correct for this. Produces results within +/- 4 NTP units of the correct value, which is +/- 1 ns -- acceptable, but the residual changes with ns
. I guess I could add another term.
Algorithm 2
uint32_t ns2 = (2 * ns) + 1;
uint32_t intts = (ns2 << 1)
+ (ns2 >> 3) + (ns2 >> 6) + (ns2 >> 8) + (ns2 >> 9) + (ns2 >> 10)
+ (ns2 >> 16) + (ns2 >> 18) + (ns2 >> 19) + (ns2 >> 20) + (ns2 >> 21)
+ (ns2 >> 22) + (ns2 >> 24) + (ns2 >> 30) + 3;
Based on the binary expansion of 4.2949673 (actually based on the binary expansion of 2.14748365, since I start by doubling and adding one to accomplish rounding). Possibly faster than algorithm 1 (I haven't gotten out the benchmarks yet). The +3 was determined empirically to cancel out undershoot from truncating all those low-order bits, but it doesn't do the best possible job.
uint32_t convert(uint32_t x) {
const uint32_t chi = 0x4b82;
const uint32_t clo = 0xfa09;
const uint32_t round = 0x9525;
const uint32_t xhi = x >> 16;
const uint32_t xlo = x & 0xffff;
uint32_t lowTerm = xlo*clo;
uint32_t crossTerms = xhi*clo + xlo*chi;
uint32_t rounded = crossTerms + (lowTerm >> 16) + round >> 16;
uint32_t highTerm = xhi*chi;
return (x << 2) + highTerm + rounded;
}
Basic fixed-point multiplication, simulating a 32x32 -> 64 product using four 16x16 -> 32 products. The constant round
was chosen to minimize the error using a simple binary search. This expression is good to +/-0.6 NTP over the entire valid range.
The leading 4
in the scale factor is handled in the shift. Compilers typically can generate pretty decent code for this type of thing, but it can often be streamlined with hand-written assembly if you need to.
If you don't need so much accuracy, you can get rid of the lowTerm
and round
and get an answer that's good to +/-1.15 NTP:
uint32_t convert(uint32_t x) {
const uint32_t chi = 0x4b82;
const uint32_t clo = 0xfa09;
const uint32_t xhi = x >> 16;
const uint32_t xlo = x & 0xffff;
uint32_t crossTerms = xhi*clo + xlo*chi;
uint32_t highTerm = xhi*chi;
return (x << 2) + highTerm + (crossTerms >> 16) + 1;
}
I might be stating the obvious… but have you googled the interwebz for fixed point math libraries? There's plenty of them. Here's a good one with C++ and x86 implementations on Flipcode's archives:
http://www.flipcode.com/archives/Fixed_Point_Routines.shtml
精彩评论