开发者

Strange python error

I am trying to write a python program that calculates a histogram, given a list of numbers like:

1
3
2
3
4
5
3.2
4
2
2

so the input parameters are the filename and the number of intervals.

The program code is:

#!/usr/bin/env python
import os, sys, re, string, array, math
import numpy

Lista = []

db = sys.argv[1] 
db_file = open(db,"r")
ic=0
nintervals= int(sys.argv[2])

while 1:
    line = db_file.readline()
    if not line:
        break
    ll=string.split(line)
    #print ll[6]
    Lista.insert(ic,float(ll[0]))
    ic=ic+1

lmin=min(Lista)
print "min= ",lmin
lmax=max(Lista)
print "max= ",lmax

width=666.666
width=(lmax-lmin)/nintervals
print "width= ",width

nelements=len(Lista)
print "nelements= ",nelements
print " "
Histogram = numpy.zeros(shape=(nintervals))

for item in Lista:
    #print item
    int_number = 1 + int((item-lmin)/width)
    print " "
    print "item,lmin= ",item,lmin
    print "(item-lmin)/width= ",(item-lmin)," / ",width," ====== ",(float(item)-float(lmin))/float(width)
    print "int((item-lmin)/width)= ",int((item-lmin)/width) 
    print item , " belongs to interval ", int_number, " which is from ", lmin+width*(int_number-1), " to ",lmin+width*int_number
    Histogram[int_number] = Histogram[int_number] + 1

4

but somehow I am completely lost, I get strange errors, can anybody help¿

Thanks

P.D. These are the results of the output:

item,lmin=  1.0 1.0
(item-lmin)/width=  0.0  /  0.666666666667  ======  0.0
int((item-lmin)/width)=  0
1.0  belongs to interval  1  which is from  1.0  to  1.66666666667

item,lmin=  2.0 1.0
(item-lmin)/width=  1.0  /  0.666666666667  ======  1.5
int((item-lmin)/width)=  1
2.0  belongs to interval  2  which is from  1.66666666667  to  2.33333333333

item,lmin=  3.0 1.0
(item-lmin)/width=  2.0  /  0.666666666667  ======  3.0
int((item-lmin)/width)=  3
3.0  belongs to interval  4  which is from  3.0  to  3.66666666667
Traceback (most recent call last):
  File "from_list_to_histogram.py", line 43, in <module>
    Histogram[int_number] = Histogram[int_number] + 1
IndexE开发者_如何学编程rror: index out of bounds

The most important errors are:

(item-lmin)/width= 1.0 / 0.666666666667 ====== 1.5

and

IndexError: index out of bounds


I believe the problem may be a peculiar off-by one in the line:

int_number = 1 + int((item-lmin)/width)

Why that 1 +? Python indices on an array of length N are from 0 to N-1 included. The 1 + here makes int_number go from 1 to 1 + (lmax-lmin)/width i.e. to 1 + nintervals given the formula for width, while you've sized Histogram to nintervals items -- so it's actually an off-by-two, worsened by the 1 + but it would be there (for lmax only) even without it. make the intervals an epsilon wider, so lmax falls inside the last one and not just beyond it, and lose the 1 +, and things might work better.


Here is a more Pythonic approach.

from itertools import groupby
from math import floor

data = [1,3,2,3,4,5,3.2,4,2,2,3.6]
data.sort()

nintervals = 3
lmax = max(data)
lmin = min(data)

width = 1.0*(lmax-lmin)/nintervals

def grouper(item):    
    return floor(1.0*(item-lmin)/width)

for i, b in groupby(data, grouper):
    print '%.3f <= i < %.3f ' %(lmin + i * width, lmin + (i+1) * width), list(b)


I just removed code that loads from file and rewrite to something more readable

from math import floor

Lista = [1,3,2,3,4,5,3.2,4,2,2]
ic=0
nintervals= 3

lmin=min(Lista)
print "min= ",lmin
lmax=max(Lista)
print "max= ",lmax

width=1.0*(lmax-lmin)/nintervals
print "width= ",width

nelements=len(Lista)
print "nelements= ",nelements
print " "
histogram =[0]*nintervals

for item in Lista:
    ind = int(floor(1.0*(item-lmin)/width))
    if ind==nintervals:
        ind=ind-1
    histogram[ind]+=1

for i,v in enumerate(histogram):
    print "from", lmin+i*width, "to", lmin+(i+1)*width, "are",v,"values"

for i,v in enumerate(histogram):
    print "Visual presentation:","="*int(round(v*40.0/lmax))


On the last line you access Histogram with a too big index. You should make sure that 'int_number' is at most

len(Histogram) - 1

There's probably a bug, which causes this problem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜