Strange python error
I am trying to write a python program that calculates a histogram, given a list of numbers like:
1
3
2
3
4
5
3.2
4
2
2
so the input parameters are the filename and the number of intervals.
The program code is:
#!/usr/bin/env python
import os, sys, re, string, array, math
import numpy
Lista = []
db = sys.argv[1]
db_file = open(db,"r")
ic=0
nintervals= int(sys.argv[2])
while 1:
line = db_file.readline()
if not line:
break
ll=string.split(line)
#print ll[6]
Lista.insert(ic,float(ll[0]))
ic=ic+1
lmin=min(Lista)
print "min= ",lmin
lmax=max(Lista)
print "max= ",lmax
width=666.666
width=(lmax-lmin)/nintervals
print "width= ",width
nelements=len(Lista)
print "nelements= ",nelements
print " "
Histogram = numpy.zeros(shape=(nintervals))
for item in Lista:
#print item
int_number = 1 + int((item-lmin)/width)
print " "
print "item,lmin= ",item,lmin
print "(item-lmin)/width= ",(item-lmin)," / ",width," ====== ",(float(item)-float(lmin))/float(width)
print "int((item-lmin)/width)= ",int((item-lmin)/width)
print item , " belongs to interval ", int_number, " which is from ", lmin+width*(int_number-1), " to ",lmin+width*int_number
Histogram[int_number] = Histogram[int_number] + 1
4
but somehow I am completely lost, I get strange errors, can anybody help¿
Thanks
P.D. These are the results of the output:
item,lmin= 1.0 1.0
(item-lmin)/width= 0.0 / 0.666666666667 ====== 0.0
int((item-lmin)/width)= 0
1.0 belongs to interval 1 which is from 1.0 to 1.66666666667
item,lmin= 2.0 1.0
(item-lmin)/width= 1.0 / 0.666666666667 ====== 1.5
int((item-lmin)/width)= 1
2.0 belongs to interval 2 which is from 1.66666666667 to 2.33333333333
item,lmin= 3.0 1.0
(item-lmin)/width= 2.0 / 0.666666666667 ====== 3.0
int((item-lmin)/width)= 3
3.0 belongs to interval 4 which is from 3.0 to 3.66666666667
Traceback (most recent call last):
File "from_list_to_histogram.py", line 43, in <module>
Histogram[int_number] = Histogram[int_number] + 1
IndexE开发者_如何学编程rror: index out of bounds
The most important errors are:
(item-lmin)/width= 1.0 / 0.666666666667 ====== 1.5
and
IndexError: index out of bounds
I believe the problem may be a peculiar off-by one in the line:
int_number = 1 + int((item-lmin)/width)
Why that 1 +
? Python indices on an array of length N are from 0 to N-1 included. The 1 +
here makes int_number go from 1 to 1 + (lmax-lmin)/width
i.e. to 1 + nintervals
given the formula for width
, while you've sized Histogram
to nintervals
items -- so it's actually an off-by-two, worsened by the 1 +
but it would be there (for lmax only) even without it. make the intervals an epsilon wider, so lmax falls inside the last one and not just beyond it, and lose the 1 +
, and things might work better.
Here is a more Pythonic approach.
from itertools import groupby
from math import floor
data = [1,3,2,3,4,5,3.2,4,2,2,3.6]
data.sort()
nintervals = 3
lmax = max(data)
lmin = min(data)
width = 1.0*(lmax-lmin)/nintervals
def grouper(item):
return floor(1.0*(item-lmin)/width)
for i, b in groupby(data, grouper):
print '%.3f <= i < %.3f ' %(lmin + i * width, lmin + (i+1) * width), list(b)
I just removed code that loads from file and rewrite to something more readable
from math import floor
Lista = [1,3,2,3,4,5,3.2,4,2,2]
ic=0
nintervals= 3
lmin=min(Lista)
print "min= ",lmin
lmax=max(Lista)
print "max= ",lmax
width=1.0*(lmax-lmin)/nintervals
print "width= ",width
nelements=len(Lista)
print "nelements= ",nelements
print " "
histogram =[0]*nintervals
for item in Lista:
ind = int(floor(1.0*(item-lmin)/width))
if ind==nintervals:
ind=ind-1
histogram[ind]+=1
for i,v in enumerate(histogram):
print "from", lmin+i*width, "to", lmin+(i+1)*width, "are",v,"values"
for i,v in enumerate(histogram):
print "Visual presentation:","="*int(round(v*40.0/lmax))
On the last line you access Histogram with a too big index. You should make sure that 'int_number' is at most
len(Histogram) - 1
There's probably a bug, which causes this problem.
精彩评论