开发者

Matplotlib histogram with log Laplacian PDF

(be sure to check out the EDIT at the end of the post 开发者_C百科before reading too deeply into the source)

I'm plotting a histogram of a population that seems to be of log Laplacian distribution:

Matplotlib histogram with log Laplacian PDF

I'm trying to draw a line of best fit for it to verify my hypothesis, but I'm having problems getting meaningful results.

I'm using the Laplacian PDF definition from Wikipedia and taking 10 to the power of the PDF (to "reverse" the effects of the log histogram).

What am I doing wrong?

Here is my code. I pipe things through standard input (cat pop.txt | python hist.py) -- here's a sample population.

from pylab import *
import numpy    
def laplace(x, mu, b):
    return 10**(1.0/(2*b) * numpy.exp(-abs(x - mu)/b))    
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=True, align='left')
    loc, scale = 0., 1.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, 0., 1.)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
    ylim((1.0, 10**7))
    show()
if __name__ == '__main__':
    main()

EDIT

OK, here is the attempt to match it to a regular Laplacian distribution (as opposed to a log Laplacian). Differences from above attempt:

  • histogram is normed
  • histogram is linear (not log)
  • laplace function defined exactly as specified in the Wikipedia article

Output:

Matplotlib histogram with log Laplacian PDF

As you can see, it isn't the best match, but the figures (the histogram and the Laplace PDF) are at least now in the same ballpark. I think the log Laplace will match a lot better. My approach (source above) didn't work. Can anybody suggest a way to do this what works?

Source:

from pylab import *
import numpy   
def laplace(x, mu, b):
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=False, align='left', normed=True)
    loc, scale = 0., 0.54
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
        show()
if __name__ == '__main__':
    main()


  1. Your laplace() function does not seem to be a Laplace distribution. Besides, numpy.log() is a natural logarithm (base e), not decimal.

  2. Your histogram does not seem to be normalized, while the distribution is.

EDIT:

  1. Don't use blanket imports from pyplot import *, it'll bite you.

  2. If you're checking consistency with Laplace distribution (or its log), use the fact that the latter is symmetric around mu: fix mu at a maximum of your histogram, and you have a single-parameter problem. And you can only use either half of the histogram as well.

  3. Use numpy's histogram function -- this way you can get the histogram itself, which you can then fit with a Laplace distribution (and/or its log). The chi-square will tell you how good (or bad) the consistency is. For fitting you can use, e.g. scipy.optimize.leastsq routine (http://www.scipy.org/Cookbook/FittingData)


I've found a work-around for the problem I was having. Instead of using matplotlib.hist, I use numpy.histogram in combination with matplotlib.bar to calculate the histogram and draw it in two separate steps.

I'm not sure if there's a way to do this using matplotlib.hist -- it would definitely be more convenient, though.

Matplotlib histogram with log Laplacian PDF

You can see that it's a much better match.

My problem now is I need to estimate the scale parameter of the PDF.

Source:

from pylab import *
import numpy

def laplace(x, mu, b):
    """http://en.wikipedia.org/wiki/Laplace_distribution"""
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)

def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    count, bins = numpy.histogram(num, nbins)
    bins = bins[:-1]
    assert len(bins) == nbins
    #
    # FIRST we take the log of the histogram, THEN we normalize it.
    # Clean up after divide by zero
    #
    count = numpy.log(count)
    for i in range(nbins):
        if count[i] == -numpy.inf:
            count[i] = 0
    count = count/max(count)

    loc = 0.
    scale = 4.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    pdf = pdf/max(pdf)

    width=1.0
    bar(bins-width/2, count, width=width)
    plot(x, pdf, color='r')
    xlim(min(num), max(num))
    show()

if __name__ == '__main__':
    main()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜