开发者

entropy in txt file

I have a text file with numbers in it as follows:

1231313123123123
1432423432535345
3532523452345345
1231423432453455
3434535345345345
3452353453253453

all the lines are the same length, I want to calculate entropy on each line and have output as:

2.64234234
2.65464564
2.35355435
etc.

Right now with this piece of code I have gives me entropy to be the same, what am I doing wrong?

Thanks.

#!/usr/bin/env python

import math

def H(data):
  if not data:
    return 0
  entropy = 0
  for x in range(256):
    p_x = float(data.count(chr(x)))/len(data)
    if p_x > 0:
      entropy += - p_x*math.log(p_x, 2)
  return entropy

failas = open('text.txt', 'r')
for row in failas:
        pri开发者_StackOverflownt H('failas')


failas = open('text.txt', 'r')
for row in failas:
    print H(row)


Perhaps you meant print H(row).


All of the above, plus you probably don't want to include the \n at end of each line in the entropy calculation. Use H(row.rstrip('\n'))

You can answer a lot of your own questions by examining the data that is being tosssed around by your code. In this case, inserting print repr(data) after the line def H(data): would have shown you what the problem was straight away.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜