Integer arithmetic errors in modern CPUs [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this questionDo I need to plan for possible miscalculations in modern CPUs, where for example an addition of two integers 1
and 1
results in 3
once?
- (How often) Do such errors in the ALU occur?
- Is there any built-in protection against this nowadays?
Is there a realistic chance that arithmetic errors like mentioned in the example above are the reason behind most "heisenbugs" out there?
CPU feature sizes have gotten small enough that errors like this in data can happen, but they're (much) more likely to happen on data being stored in memory than for an actual miscalculation to happen.
In some radiation-rich environments (e.g., on satellites) it's fairly common to have (for example) multiple CPUs that "vote" on an outcome, or repeat calculations when/if there's a disagreement. Other than that, about the only time it might be reasonable would be in something that was likely to affect human lives.
While it's possible that there's a Heisenbug that's really a result of something like a single-bit upset, it's extremely unlikely, at least IMO. I've seen quite a few bugs, some of which were hard to track down -- but when they were, there were really mistakes in the code.
You should never see errors with integer math. Even with floating point arithmetic it's exceedingly rare, unless someone is using a much older processor or your trying doing something with irrational numbers, incredible precision, and you aren't using a specialized math library.
Are you doing something where you seem integer errors? I'd be interested if you were.
Do I need to plan for possible miscalculations in modern CPUs
Yes. You also need to plan for spontaneous formation of black holes which could suddenly absorb all nearby matter, including you.
Do such errors in the ALU occur?
Well. If only engineers would use error-correcting codes, the odds are very, very small. What would have to happen is that a combination of error bits that happened to look valid would have to spontaneously arise in the circuitry. The odds aren't zero, but they're small.
Is there any built-in protection against this nowadays?
If only Error-correcting codes were not totally forgotten. Remember, "Parity is for farmers".
http://en.wikipedia.org/wiki/Error_detection_and_correction
http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction
http://en.wikipedia.org/wiki/SECDED#Hamming_codes_with_additional_parity_.28SECDED.29
Is there a realistic chance that arithmetic errors like mentioned
Yes. If you define "realistic" as non-zero, but really, really small.
Recent tests give widely varying error rates with over 7 orders of magnitude difference, ranging from 10^−10 to 10^−17 error/bit·h,
roughly one bit error, per hour, per gigabyte of memory to one bit error, per century, per gigabyte of memory.
精彩评论