Overflow in exp in scipy/numpy in Python?
What does the following error:
Warning: overflow encountered in exp
in scipy/numpy using Python generally mean? I'm computing a ratio in log form, i.e. log(a) + log(b) and then taking the exponent of the result, using exp, and using a sum with logsumexp, as follows:
c = log(a) +开发者_如何学C log(b)
c = c - logsumexp(c)
some values in the array b are intentionally set to 0. Their log will be -Inf.
What could be the cause of this warning? thanks.
In your case, it means that b
is very small somewhere in your array, and you're getting a number (a/b
or exp(log(a) - log(b))
) that is too large for whatever dtype (float32, float64, etc) the array you're using to store the output is.
Numpy can be configured to
- Ignore these sorts of errors,
- Print the error, but not raise a warning to stop the execution (the default)
- Log the error,
- Raise a warning
- Raise an error
- Call a user-defined function
See numpy.seterr
to control how it handles having under/overflows, etc in floating point arrays.
When you need to deal with exponential, you quickly go into under/over flow since the function grows so quickly. A typical case is statistics, where summing exponentials of various amplitude is quite common. Since the numbers are very big/smalls, one generally takes the log to stay in a "reasonable" range, the so-called log domain:
exp(-a) + exp(-b) -> log(exp(-a) + exp(-b))
Problems still arise because exp(-a) will still underflows up. For example, exp(-1000) is already below the smallest number you can represent as a double. So for example:
log(exp(-1000) + exp(-1000))
gives -inf (log (0 + 0)), even though you can expect something like -1000 by hand (-1000 + log(2)). The function logsumexp does it better, by extracting the max of the number set, and taking it out of the log:
log(exp(a) + exp(b)) = m + log(exp(a-m) + exp(b-m))
It does not avoid underflow totally (if a and b are vastly different for example), but it avoids most precision issues in the final result
I think you can use this method to solve this problem:
Normalized
I overcome the problem in this method. Before using this method, the accuracy my classify is :86%. After using this method, the accuracy of my classify is :96%!!!
It's great!
first:
Min-Max scaling
second:
Z-score standardization
These are common methods to implement normalization
.
I use the first method. And I alter it. The maximum number is divided by 10.
So the maximum number of the result is 10. Then exp(-10) will be not overflow
!
I hope my answer will help you !(^_^)
Isn't exp(log(a) - log(b))
the same as exp(log(a/b))
which is the same as a/b
?
>>> from math import exp, log
>>> exp(log(100) - log(10))
10.000000000000002
>>> exp(log(1000) - log(10))
99.999999999999957
2010-12-07: If this is so "some values in the array b are intentionally set to 0", then you are essentially dividing by 0. That sounds like a problem.
In my case, it was due to large values in the data. I had to normalize (divide by 255, because my data was related to images) to get the values scaled down.
精彩评论