Lexicographical ordering of multiple doubles

2023-01-07 07:18 问答作者：

Consider a class of type doubles

class path_cost {
   double length;
   double time;
};

If I want to lexicographically order a list of path_costs, I have a problem. Read on :)

If I use exact equal for the equality test like so

bool operator<(const path_cost& rhs) const {
   if (length == rhs.length) return time < rhs.time;
   return length < rhs.length;
}

the resulting order is likely to be wrong, becau开发者_高级运维se a small deviation (e.g. due to numerical inaccuracies in the calculation of the length) may cause the length test to fail, so that e.g.

{ 231.00000000000001, 40 } < { 231.00000000000002, 10 }

erroneously holds.

If I alternatively use a tolerance like so

bool operator<(const path_cost& rhs) const {
   if (std::fabs(length-rhs.length)<1-e6)) return time < rhs.time;
   return length < rhs.length;
}

then the sorting algorithm may horribly fail since the <-operator is no longer transitive (that is, if a < b and b < c then a < c may not hold)

Any ideas? Solutions? I have thought about partitioning the real line, so that numbers within each partition is considered equal, but that still leaves too many cases where the equality test fails but should not.

(UPDATE by James Curran, hopefully explaining the problem): Given the numbers:

A = {231.0000001200, 10}
B = {231.0000000500, 40}
C = {231.0000000100, 60}
- A.Length & B.Length differ by 7-e7, so we use time, and A < B.
- B.Length & C.Length differ by 4-e7, so we use time, and B < C.
- A.Length & C.Length differ by 1.1-e6, so we use length, and A > C.

(Update by Esben Mose Hansen) This is not purely theoretical. The standard sort algorithms tends to crash or worse when given a non-transitive sort operator. And this is exactly what I been contending with (and boy was that fun to debug ;) )

Do you really want just a compare function?

Why don't you sort by length first, then group the pairs into what you think are the same length and then sort within each group by time?

Once sorted by length, you can apply whatever heuristic you need, to determine 'equality' of lengths, to do the grouping.

I don't think you are going to be able to do what you want. Essentially you seem to be saying that in certain cases you want to ignore the fact that a>b and pretend a=b. I'm pretty sure that you can construct a proof that says if a and b are equivalent when the difference is smaller than a certain value then a and b are equivalent for all values of a and b. Something along the lines of:

For a tolerance of C and two numbers A and B where without loss of generality A > B then there exist D(n) = B+n*(C/10) where 0<=n<=(10*(A-B))/(C) such that trivially D(n) is within the tolerance of D(n-1) and D(n+1) and therefore equivalent to them. Also D(0) is B and D((10*(A-B))/(C))=A so A and B can be said to be equivalent.

I think the only way you can solve that problem is using a partitioning method. Something like multiplying by 10^6 and then converting to an int shoudl partition pretty well but will mean that if you have 1.00001*10^-6 and 0.999999*10^-6 then they will come out in different partitions which may not be desired.

The problem then becomes looking at your data to work out how to best partition it which I can't help with since I don't know anything about your data. :)

P.S. Do the algorithms actually crash when given the algorithm or just when they encounter specific unsolvable cases?

I can think of two solutions.

You could carefully choose a sorting algorithm that does not fail when the comparisons are intransitive. For example, quicksort shouldn't fail, at least if you implement it yourself. (If you are worried about the worst case behavior of quicksort, you can first randomize the list, then sort it.)

Or you could extend your tolerance patch so that it becomes an equivalence relation and you restore transitivity. There are standard union-find algorithms to complete any relation to an equivalence relation. After applying union-find, you can replace the length in each equivalence class with a consensus value (such as the average, say) and then do the sort that you wanted to do. It feels a bit strange to doctor floating point numbers to prevent spurious reordering, but it should work.

Actually, Moron makes a good point. Instead of union and find, you can sort by length first, then link together neighbors that are within tolerance, then do a subsort within each group on the second key. That has the same outcome as my second suggestion, but it is a simpler implementation.

I'm not familiar with your application, but I'd be willing to bet that the differences in distance between points in your graph are many orders of magnitude larger than the rounding errors on floating point numbers. Therefore, if two entries differ by only the round-off error, they are essentially the same, and it makes no difference in which order they appear in your list. From a common-sense perspective, I see no reason to worry.

You will never get 100% precision with ordinary doubles. You say that you are afraid that using tolerances will affect the correctness of your program. Have you actually tested this? What level of precision does your program actually need?

In most common applications I find a tolerance of something like 1e-9 suffices. Of course it all depends on your application. You can estimate the level of accuracy you need and just set the tolerance to an acceptable value.

If even that fails, it means that double is simply inadequate for your purposes. This scenario is highly unlikely, but can arise if you need very high precision calculations. In that case you have to use an arbitrary precision package (e.g. BigDecimal in Java or something like GMP for C). Again, only choose this option when there is no other way.

继续阅读：algorithm numeric

Lexicographical ordering of multiple doubles

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？