Python - Match Words in Text File to Dictionary and Manipulate Value
I have a dictionary where the keys are simple words and the values ar开发者_JS百科e a score. I want to calculate a score based upon the frequency of the word and the score (value) stored in the dictionary compared to mathed words in a file (or string). For example, suppose my text was:
"Dogs are great pets and hamsters are bad pets. That is why I want a dog"
My dictionary is:
Dict = {'dogs' : 5, 'hampsters' : -2}
Then I would want to calculate a score of 8 ((2x5)-2 = 8). I can find occurences in the dictionary with
for key in Dict:
m = re.findall(key, READ , re.IGNORECASE)
but I haven't been able to access the value of the key in a useful manner.
Any help is greatly appreciated.
Thanks, Scott
EDIT: Steve V inspired the following, which is rather nicer:
sentence = "...".split()
score = sum(sentence.count(word) * score for word, score in scores.items())
The obligatory one-liner:
>>> s = "Dogs are great pets and hamsters are bad pets. That is why I want a dog."
>>> scores = {'dogs': 5, 'hamsters': -2}
>>> import collections
>>> sum(scores.get(word.lower(), 0) * freq for word, freq in collections.Counter(s.split()).items())
3
and split up:
>>> sum = 0
>>> counts = collections.Counter(s.split())
>>> for word, freq in counts.items():
... sum += scores.get(word.lower(), 0) * freq
...
>>> sum
3
Notable features:
The score isn't 8 (as you claimed above) but 3, because the word
dogs
only appears once in the string you gave. If you want to count the worddog
twice, you will need a (much) more complicated algorithm, probably interfacing with a pluralisation library to handle cases likechild -> children
andman -> men
. This will not be easy or necessarily correct.I've included
.lower()
to ignore capitalisation in the string you gave. If you don't want that, just remove the call.You misspelt "hamster" :p.
Use katrielalex's answer if possible, it's cleaner than mine. If you don't have Python 2.7 (like me), this may work for you:
sentence = "Dogs are great pets and hamsters are bad pets. That is why I want a dog"
scores = {'dog' : 5, 'hamster' : -2}
occurrences = {}
for key in scores:
m = re.findall(key, sentence , re.IGNORECASE)
occurrences[key] = len(m)
totalScore = 0
for word in occurrences:
totalScore += scores.get(word.lower(), 0) * occurrences[word]
print totalScore
I did "dogs" -> "dog" in your scores dictionary, on the assumption that it was a typo. If you change it back, your result will be 3 without pluralization.
this should work:
mtext ="Dogs are great pets and hamsters are bad pets. That is why I want a dog" for key in Dict: p = re.compile('dog', re.IGNORECASE) NuOfDogs=len(p.findall(mtext)) #returns number of occurences
Another variation of katrielalex's answer for people stuck with Python 2.6,
put this snippet in a file (counter.py for instance): http://code.activestate.com/recipes/576611/
then you can use the following code:
from counter import Counter
counts = Counter(text.split())
for word, freq in counts.items():
sum += scores.get(word.lower(), 0) * freq
...
Pretty much the same except it works with older Python's versions.
精彩评论