Issue in string matching in Python

2023-01-23 08:44 问答作者：

I am trying to read from a file and match for a certain combination of strings. PFB my program:

def negative_verbs_features(filename):

    # Open and read the file content
    file = open (filename, "r")
    text = file.read()
    file.close()

    # Create a list of negative verbs from the MPQA lexicon
    file_negative_mpqa = open("../data/PolarLexicons/negative_mpqa.txt", "r")
    negative_verbs = []
    for line in file_negative_mpqa:
        #print line,
        pos, word = line.split(",")
        #print line.split(",")      
        if pos == "verb":
            negative_verbs.append(word)
    return negative_verbs

if __name__ == "__main__":
    print negative_verbs_features("../data/test.txt")

The file negative_mpqa.txt consists of word, part-of-speech tag pairs separated by a comma(,). Here's a snippet of the file:

abandoned,adj
abandonment,noun
abandon,verb
abasement,anypos
abase,verb
abash,verb
abate,verb
abdicate,verb
aberration,adj
aberration,noun

I would like create a list of all words in the file which has verb as it's part-of-speech. However, when I run my program and the list returned (negative_verbs) is always empty. The if loop wasn't executing. I tried printing word,pos pair by uncommenting the line print line.split(",") PFB a snippet of the ouput.

['wrongful', 'adj\r\n']
开发者_JAVA技巧['wrongly', 'anypos\r\n']
['wrought', 'adj\r\n']
['wrought', 'noun\r\n']
['yawn', 'noun\r\n']
['yawn', 'verb\r\n']
['yelp', 'verb\r\n']
['zealot', 'noun\r\n']
['zealous', 'adj\r\n']
['zealously', 'anypos\r\n']

I understand my file may have some special characters like newline and return feed at the end of every line. I just want to ignore them and build my list. Kindly let me know how to proceed.

PS: I am newbie in Python.

You said the file has lines like this: abandoned,adj so those are word, pos pairs. But you wrote pos, word = line.split(",") which means that pos == 'abandoned' and word == 'adj' ... I think it's clear why the list will be empty now :-)

Replace the line pos, word = line.split(",") by

word, pos = line.rstrip().split(",")

rstrip() removes the white characters (spaces, new lines, carriage return...) at the right of your string. Note that lstrip() and even strip() also exist. You also switched word and pos!

You could also use rstrip() on your word variable instead, when you append it to your list.

继续阅读：file-io python special-characters string

Issue in string matching in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Best solution for private video database [closed]

Easiest way to get words of one line from istream into a vector?

国内夏季避暑旅游胜地有哪些？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？