word count problem
I wanna count words from text files which contain data as follows:
ROK :
ROK/(NN)
New :
New/(SV)
releases, :
releases/(NN) + ,/(SY)
week :
week/(EP)
last :
last/(JO)
compared :
compare/(VV) + -ed/(EM)
year :
year/(DT)
releases :
releases/(NN)
The expressions like /(NN), /(SV), and /(EP) are considered category. I wanna extract the words just before each of category and count how many words are in the whole text.
I wanna write a result in a new text file like this:
(NN)
releases 2
ROK 1
(SY)
New 1
, 1
(EP)
week 1
(JO)
last 1
......
Please help me out!
here is my garage code ;_; it doesn't work.
import os, sys
import re
wordset = {}
for line in open('E:\\mach.txt', 'r'):
if '/(' in line:
word = re.findall(r'(\w)/\(', line)
print word
if word not in开发者_StackOverflow中文版 wordset: wordset[word]=1
else: wordset[word]+=1
f = open('result.txt', 'w')
for word in wordset:
print>> f, word, wordset[word]
f.close()
from __future__ import print_function
import re
REGEXP = re.compile(r'(\w+)/(\(.*?\))')
def main():
words = {}
with open('E:\\mach.txt', 'r') as fp:
for line in fp:
for item, category in REGEXP.findall(line):
words.setdefault(category, {}).setdefault(item, 0)
words[category][item] += 1
with open('result.txt', 'w') as fp:
for category, words in sorted(words.items()):
print(category, file=fp)
for word, count in words.items():
print(word, count, sep=' ', file=fp)
print(file=fp)
return 0
if __name__ == '__main__':
raise SystemExit(main())
You're welcome (= If you will want also count that weird "-ed" or ",", tune regexp to match any character except whitespace:
REGEXP = re.compile(r'([^\s]+)/(\(.*?\))')
You're trying to use a list (yes word is a list) as an index. Here is what you should do:
import re
wordset = {}
for line in open('testdata.txt', 'r'):
if '/(' in line:
words = re.findall(r'(\w)/\(', line)
print words
for word in words:
if word not in wordset:
wordset[word]=1
else:
wordset[word]+=1
f = open('result.txt', 'w')
for word in wordset:
print>> f, word, wordset[word]
f.close()
You're lucky I want to learn python, otherwise I wouldn't have tried your code. Next time post the error you're getting! I bet it was
TypeError: unhashable type: 'list'
It's important to help us help you if you want good answers!
精彩评论