word count problem

2023-04-08 02:53 问答作者：

I wanna count words from text files which contain data as follows:

ROK :
    ROK/(NN)
New :
    New/(SV)
releases, :
    releases/(NN) + ,/(SY)
week :
    week/(EP)
last :
    last/(JO)
compared :
    compare/(VV) + -ed/(EM)
year :
    year/(DT)
releases :
    releases/(NN)

The expressions like /(NN), /(SV), and /(EP) are considered category. I wanna extract the words just before each of category and count how many words are in the whole text.

I wanna write a result in a new text file like this:

(NN)
releases 2
ROK 1

(SY)
New 1
, 1

(EP)
week 1

(JO)
last 1

......

Please help me out!

here is my garage code ;_; it doesn't work.

import os, sys
import re

wordset = {}
for line in open('E:\\mach.txt', 'r'):
    if '/(' in line:
        word = re.findall(r'(\w)/\(', line)
        print word
        if word not in开发者_StackOverflow中文版 wordset: wordset[word]=1
        else: wordset[word]+=1

f = open('result.txt', 'w')
for word in wordset:
    print>> f, word, wordset[word]
f.close()

from __future__ import print_function                                                                                                                                                                                                                                  
import re                                                                                                                                                                                                                                                              


REGEXP = re.compile(r'(\w+)/(\(.*?\))')                                                                                                                                                                                                                                


def main():                                                                                                                                                                                                                                                            
    words = {}                                                                                                                                                                                                                                                         

    with open('E:\\mach.txt', 'r') as fp:
        for line in fp:                                                                                                                                                                                                                                                    
            for item, category in REGEXP.findall(line):                                                                                                                                                                                                                    
                words.setdefault(category, {}).setdefault(item, 0)                                                                                                                                                                                                         
                words[category][item] += 1                                                                                                                                                                                                                                 

    with open('result.txt', 'w') as fp:                                                                                                                                                                                                                                       
        for category, words in sorted(words.items()):                                                                                                                                                                                                                      
            print(category, file=fp)                                                                                                                                                                                                                                       
            for word, count in words.items():                                                                                                                                                                                                                              
                print(word, count, sep=' ', file=fp)                                                                                                                                                                                                                       
            print(file=fp)                                                                                                                                                                                                                                                 
    return 0                                                                                                                                                                                                                                                           

if __name__ == '__main__':                                                                                                                                                                                                                                             
    raise SystemExit(main())

You're welcome (= If you will want also count that weird "-ed" or ",", tune regexp to match any character except whitespace:

REGEXP = re.compile(r'([^\s]+)/(\(.*?\))')

You're trying to use a list (yes word is a list) as an index. Here is what you should do:

import re

wordset = {}
for line in open('testdata.txt', 'r'):
    if '/(' in line:
        words = re.findall(r'(\w)/\(', line)
        print words
        for word in words:
          if word not in wordset: 
            wordset[word]=1
          else: 
            wordset[word]+=1

f = open('result.txt', 'w')
for word in wordset:
    print>> f, word, wordset[word]
f.close()

You're lucky I want to learn python, otherwise I wouldn't have tried your code. Next time post the error you're getting! I bet it was

TypeError: unhashable type: 'list'

It's important to help us help you if you want good answers!

继续阅读：python

word count problem

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？