开发者

Lexicographical ordering of multiple doubles

Consider a class of type doubles

class path_cost {
   double length;
   double time;
};

If I want to lexicographically order a list of path_costs, I have a problem. Read on :)

If I use exact equal for the equality test like so

bool operator<(const path_cost& rhs) const {
   if (length == rhs.length) return time < rhs.time;
   return length < rhs.length;
}

the resulting order is likely to be wrong, becau开发者_高级运维se a small deviation (e.g. due to numerical inaccuracies in the calculation of the length) may cause the length test to fail, so that e.g.

{ 231.00000000000001, 40 } < { 231.00000000000002, 10 }

erroneously holds.

If I alternatively use a tolerance like so

bool operator<(const path_cost& rhs) const {
   if (std::fabs(length-rhs.length)<1-e6)) return time < rhs.time;
   return length < rhs.length;
}

then the sorting algorithm may horribly fail since the <-operator is no longer transitive (that is, if a < b and b < c then a < c may not hold)

Any ideas? Solutions? I have thought about partitioning the real line, so that numbers within each partition is considered equal, but that still leaves too many cases where the equality test fails but should not.

(UPDATE by James Curran, hopefully explaining the problem): Given the numbers:

  • A = {231.0000001200, 10}
  • B = {231.0000000500, 40}
  • C = {231.0000000100, 60}

    • A.Length & B.Length differ by 7-e7, so we use time, and A < B.
    • B.Length & C.Length differ by 4-e7, so we use time, and B < C.
    • A.Length & C.Length differ by 1.1-e6, so we use length, and A > C.

(Update by Esben Mose Hansen) This is not purely theoretical. The standard sort algorithms tends to crash or worse when given a non-transitive sort operator. And this is exactly what I been contending with (and boy was that fun to debug ;) )


Do you really want just a compare function?

Why don't you sort by length first, then group the pairs into what you think are the same length and then sort within each group by time?

Once sorted by length, you can apply whatever heuristic you need, to determine 'equality' of lengths, to do the grouping.


I don't think you are going to be able to do what you want. Essentially you seem to be saying that in certain cases you want to ignore the fact that a>b and pretend a=b. I'm pretty sure that you can construct a proof that says if a and b are equivalent when the difference is smaller than a certain value then a and b are equivalent for all values of a and b. Something along the lines of:

For a tolerance of C and two numbers A and B where without loss of generality A > B then there exist D(n) = B+n*(C/10) where 0<=n<=(10*(A-B))/(C) such that trivially D(n) is within the tolerance of D(n-1) and D(n+1) and therefore equivalent to them. Also D(0) is B and D((10*(A-B))/(C))=A so A and B can be said to be equivalent.

I think the only way you can solve that problem is using a partitioning method. Something like multiplying by 10^6 and then converting to an int shoudl partition pretty well but will mean that if you have 1.00001*10^-6 and 0.999999*10^-6 then they will come out in different partitions which may not be desired.

The problem then becomes looking at your data to work out how to best partition it which I can't help with since I don't know anything about your data. :)

P.S. Do the algorithms actually crash when given the algorithm or just when they encounter specific unsolvable cases?


I can think of two solutions.

You could carefully choose a sorting algorithm that does not fail when the comparisons are intransitive. For example, quicksort shouldn't fail, at least if you implement it yourself. (If you are worried about the worst case behavior of quicksort, you can first randomize the list, then sort it.)

Or you could extend your tolerance patch so that it becomes an equivalence relation and you restore transitivity. There are standard union-find algorithms to complete any relation to an equivalence relation. After applying union-find, you can replace the length in each equivalence class with a consensus value (such as the average, say) and then do the sort that you wanted to do. It feels a bit strange to doctor floating point numbers to prevent spurious reordering, but it should work.


Actually, Moron makes a good point. Instead of union and find, you can sort by length first, then link together neighbors that are within tolerance, then do a subsort within each group on the second key. That has the same outcome as my second suggestion, but it is a simpler implementation.


I'm not familiar with your application, but I'd be willing to bet that the differences in distance between points in your graph are many orders of magnitude larger than the rounding errors on floating point numbers. Therefore, if two entries differ by only the round-off error, they are essentially the same, and it makes no difference in which order they appear in your list. From a common-sense perspective, I see no reason to worry.


You will never get 100% precision with ordinary doubles. You say that you are afraid that using tolerances will affect the correctness of your program. Have you actually tested this? What level of precision does your program actually need?

In most common applications I find a tolerance of something like 1e-9 suffices. Of course it all depends on your application. You can estimate the level of accuracy you need and just set the tolerance to an acceptable value.

If even that fails, it means that double is simply inadequate for your purposes. This scenario is highly unlikely, but can arise if you need very high precision calculations. In that case you have to use an arbitrary precision package (e.g. BigDecimal in Java or something like GMP for C). Again, only choose this option when there is no other way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜