Python - Match Words in Text File to Dictionary and Manipulate Value

2023-01-31 21:19 问答作者：

I have a dictionary where the keys are simple words and the values ar开发者_JS百科e a score. I want to calculate a score based upon the frequency of the word and the score (value) stored in the dictionary compared to mathed words in a file (or string). For example, suppose my text was:

"Dogs are great pets and hamsters are bad pets. That is why I want a dog"

My dictionary is:

Dict = {'dogs' : 5, 'hampsters' : -2}

Then I would want to calculate a score of 8 ((2x5)-2 = 8). I can find occurences in the dictionary with

    for key in Dict: 
    m = re.findall(key, READ , re.IGNORECASE)

but I haven't been able to access the value of the key in a useful manner.

Any help is greatly appreciated.

Thanks, Scott

EDIT: Steve V inspired the following, which is rather nicer:

sentence = "...".split()
score = sum(sentence.count(word) * score for word, score in scores.items())

The obligatory one-liner:

>>> s = "Dogs are great pets and hamsters are bad pets. That is why I want a dog."
>>> scores = {'dogs': 5, 'hamsters': -2}
>>> import collections
>>> sum(scores.get(word.lower(), 0) * freq for word, freq in collections.Counter(s.split()).items())
3

and split up:

>>> sum = 0
>>> counts = collections.Counter(s.split())
>>> for word, freq in counts.items():
...     sum += scores.get(word.lower(), 0) * freq
...
>>> sum
3

Notable features:

The score isn't 8 (as you claimed above) but 3, because the word dogs only appears once in the string you gave. If you want to count the word dog twice, you will need a (much) more complicated algorithm, probably interfacing with a pluralisation library to handle cases like child -> children and man -> men. This will not be easy or necessarily correct.
I've included .lower() to ignore capitalisation in the string you gave. If you don't want that, just remove the call.
You misspelt "hamster" :p.

Use katrielalex's answer if possible, it's cleaner than mine. If you don't have Python 2.7 (like me), this may work for you:

sentence = "Dogs are great pets and hamsters are bad pets. That is why I want a dog"

scores = {'dog' : 5, 'hamster' : -2} 

occurrences = {}

for key in scores: 
  m = re.findall(key, sentence , re.IGNORECASE)
  occurrences[key] = len(m)

totalScore = 0

for word in occurrences:
  totalScore += scores.get(word.lower(), 0) * occurrences[word]

print totalScore

I did "dogs" -> "dog" in your scores dictionary, on the assumption that it was a typo. If you change it back, your result will be 3 without pluralization.

this should work:

mtext ="Dogs are great pets and hamsters are bad pets. That is why I want a dog" for key in Dict: p = re.compile('dog', re.IGNORECASE) NuOfDogs=len(p.findall(mtext)) #returns number of occurences

Another variation of katrielalex's answer for people stuck with Python 2.6,

put this snippet in a file (counter.py for instance): http://code.activestate.com/recipes/576611/

then you can use the following code:

from counter import Counter

counts = Counter(text.split())
for word, freq in counts.items():
    sum += scores.get(word.lower(), 0) * freq 
...

Pretty much the same except it works with older Python's versions.

继续阅读：python

Python - Match Words in Text File to Dictionary and Manipulate Value

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？