Matplotlib histogram with log Laplacian PDF

2023-03-07 00:11 问答作者：

(be sure to check out the EDIT at the end of the post 开发者_C百科before reading too deeply into the source)

I'm plotting a histogram of a population that seems to be of log Laplacian distribution:

I'm trying to draw a line of best fit for it to verify my hypothesis, but I'm having problems getting meaningful results.

I'm using the Laplacian PDF definition from Wikipedia and taking 10 to the power of the PDF (to "reverse" the effects of the log histogram).

What am I doing wrong?

Here is my code. I pipe things through standard input (cat pop.txt | python hist.py) -- here's a sample population.

from pylab import *
import numpy    
def laplace(x, mu, b):
    return 10**(1.0/(2*b) * numpy.exp(-abs(x - mu)/b))    
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=True, align='left')
    loc, scale = 0., 1.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, 0., 1.)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
    ylim((1.0, 10**7))
    show()
if __name__ == '__main__':
    main()

EDIT

OK, here is the attempt to match it to a regular Laplacian distribution (as opposed to a log Laplacian). Differences from above attempt:

histogram is normed
histogram is linear (not log)
laplace function defined exactly as specified in the Wikipedia article

Output:

Matplotlib histogram with log Laplacian PDF

As you can see, it isn't the best match, but the figures (the histogram and the Laplace PDF) are at least now in the same ballpark. I think the log Laplace will match a lot better. My approach (source above) didn't work. Can anybody suggest a way to do this what works?

Source:

from pylab import *
import numpy   
def laplace(x, mu, b):
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)
def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    n, bins, patches = hist(num, nbins, range=(min(num), max(num)), log=False, align='left', normed=True)
    loc, scale = 0., 0.54
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    plot(x, pdf)
    width = max(-min(num), max(num))
    xlim((-width, width))
        show()
if __name__ == '__main__':
    main()

Your laplace() function does not seem to be a Laplace distribution. Besides, numpy.log() is a natural logarithm (base e), not decimal.
Your histogram does not seem to be normalized, while the distribution is.

EDIT:

Don't use blanket imports from pyplot import *, it'll bite you.
If you're checking consistency with Laplace distribution (or its log), use the fact that the latter is symmetric around mu: fix mu at a maximum of your histogram, and you have a single-parameter problem. And you can only use either half of the histogram as well.
Use numpy's histogram function -- this way you can get the histogram itself, which you can then fit with a Laplace distribution (and/or its log). The chi-square will tell you how good (or bad) the consistency is. For fitting you can use, e.g. scipy.optimize.leastsq routine (http://www.scipy.org/Cookbook/FittingData)

I've found a work-around for the problem I was having. Instead of using matplotlib.hist, I use numpy.histogram in combination with matplotlib.bar to calculate the histogram and draw it in two separate steps.

I'm not sure if there's a way to do this using matplotlib.hist -- it would definitely be more convenient, though.

Matplotlib histogram with log Laplacian PDF

You can see that it's a much better match.

My problem now is I need to estimate the scale parameter of the PDF.

Source:

from pylab import *
import numpy

def laplace(x, mu, b):
    """http://en.wikipedia.org/wiki/Laplace_distribution"""
    return 1.0/(2*b) * numpy.exp(-abs(x - mu)/b)

def main():
    import sys
    num = map(int, sys.stdin.read().strip().split(' '))
    nbins = max(num) - min(num)
    count, bins = numpy.histogram(num, nbins)
    bins = bins[:-1]
    assert len(bins) == nbins
    #
    # FIRST we take the log of the histogram, THEN we normalize it.
    # Clean up after divide by zero
    #
    count = numpy.log(count)
    for i in range(nbins):
        if count[i] == -numpy.inf:
            count[i] = 0
    count = count/max(count)

    loc = 0.
    scale = 4.
    x = numpy.arange(bins[0], bins[-1], 1.)
    pdf = laplace(x, loc, scale)
    pdf = pdf/max(pdf)

    width=1.0
    bar(bins-width/2, count, width=width)
    plot(x, pdf, color='r')
    xlim(min(num), max(num))
    show()

if __name__ == '__main__':
    main()

继续阅读：histogram python

Matplotlib histogram with log Laplacian PDF

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？