C++ STL Vector Sorting - corrupting & zeroing out
The programme that I am developing is aimed to process very large amounts of data and generate at least 2^34 boolean data. These data statically generated & cleared throughout the programme run (only a portion is sorted at each instance) and finally a vector of minimum 2^21 rows of statistical data is passed to the final stage for further processing.
However, the STL Sorting fails for some input data. After Sorting finishes its process, some of the vector rows will be zeroed out or corrupted. It seems the only option that I have is to try to hard-code a hybrid Quicksort/Insertion sort algorithm.
I appreciate if you project your thoughts. Cheers.
Data Structure of data for the final stage:
struct statisticalValues{
unsigned long long id; //index id
unsigned int col_Sum; //Sum: total number of 1s for each combination
unsigned int col_Relevancy; //Relevancy = total number of 1s produced by (Comb AND Rel)
float col_Sensitivity; //Sensitivity= Relevancy / X
float col_Precision; //Precision= Relevancy / Sum
};
extern vector<statisticalValues> statistics;
Calling STL Sort:
sort(statistics.begin(), statistics.end(), BySensitivity());
The comparison criteria:
#define EPSILON 0.0001 // user-defined tolerance for equality of floating-point numbers
struct BySensitivity {
bool operator()(statisticalValues const &a, statisticalValues const &b) const {
float sensitivityDif = b.col_Sensitivity - a.col_Sensitivity;
if((sensitivityDif < EPSILON) && (sensitivityDif > -EPSILON)){
return ((b.col_Precision - a.col_Precision) < EPSILON);
}else{
return (sensitivityDif < -EPSILON);
}
}
};
The rows of the sample data that will be corrupted (in no particular order):
id,col_Sum,col_Relevancy,col_Sensitivity,col_Precision
1568676,5353,3696,94.166,69.045
1770228,5353,3696,94.166,69.045
2040533,5353,3696,94.166,69.045
2053376,5353,3696,94.166,69.045
1231712,4668,3425,87.261,73.372
1946656,4668,3425,87.261,73.372
1948021,4668,3425,87.261,73.372
After corrupting & zeroing out by STL Sorting:
id,col_Sensitivity,col_Precision
10540996开发者_运维知识库614775448722,5.8399e-34,5.8399e-34
8589934369,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
0,0.0000,0.0000
After implementing suggested modifications:
The comparison criteria:
struct BySensitivity {
bool operator()(statisticalValues const &a, statisticalValues const &b) const {
float sensitivityDif = b.col_Sensitivity - a.col_Sensitivity;
if((sensitivityDif <= EPSILON) && (sensitivityDif >= -EPSILON)){
return ((b.col_Precision - a.col_Precision) < -EPSILON);
}else{
return (sensitivityDif < -EPSILON);
}
}
};
Thnaks to @Mark-B, @btilly, @David-Thornley, @sth & @Daniel-Gallagher
Your comparator doesn't implement strict weak ordering. For example two items A
and B
with equal col_Sensitivity
and col_Precision
, both A
< B
and B
< A
are true. As you can imagine, trying to sort with a sort function that doesn't actually provide an ordering can produce undefined behavior.
Thanks to (and quoting) @David Thornley for the standard reference:
Standard, part of 25.3/3: "For the algorithms to work correctly, comp has to induce a strict weak ordering on the values." This means that not having a strict weak ordering is undefined (the Standard says nothing).
I think in this case you just want to remove all the epsilon checks completely:
struct BySensitivity {
bool operator()(statisticalValues const &a, statisticalValues const &b) const {
float sensitivityDif = b.col_Sensitivity - a.col_Sensitivity;
if(sensitivityDif == 0.0)){
return ((b.col_Precision - a.col_Precision) < 0.0);
}else{
return (sensitivityDif < 0.0);
}
}};
The STL sort can corrupt data if the comparison operator can produce inconsistent results, such as x < y < z < x.
Your comparison operator can produce inconsistent results.
精彩评论