开发者

how to print the linenumber of incorrectwords located in a txt file?

i have this piece of code which only prints the line number of the incorrect words. i want it to print the linenumbers of the incorrect words from the txt file. Am i able to modify this code to do that?

# text1 is my incorrect words
# words is my text file where my incorrect word are in 

from collections import defaultdict
d = defaultdict(list)
for lineno, word in enumerate(text1):
    d[word].append(lineno)
print(d)

ive now done this but this prints the character its located like the place of the word rather then the line. this is the code

import sys
import string

text = []
infile = open(sys.argv[1], 'r').read()
for punct in string.punctuation:
    infile = infile.replace(punct, "")
    text = infile.split()

dict = open(sys.argv[2], 'r').read()
dictset = []
dictset = dict.split()

words = []
words = list(set(text) - set(dictset))
words = [text.lower() for text in words]
words.sort()

def allwords(line):
    return line.split()
def iswrong(word):
    return word in words
for i, line in enumerate(text):
    for word in allwords(line):
        if iswrong(word):
            print(word, i))

the output of that code is

millwal    342

this is printing where the character is located not which line its located

i want it to print the line number so what do i change in my code?????开发者_运维问答


You could completely rewrite this code to do what you mention -- this code's structure has no relation whatsoever to what you require.

Since you need "line numbers from a text file", you'll need an object representing the text file (either as a list of lines in memory, or as an open file object). You say you have one called words (it's not clear if that's a filename or a Python variable identifier): having the text in a file called (say, as a variable) words and the (incorrect) words in a (collection of some kind) named text1 is a truly horrible choice of names, possibly the worst I've seen in many decades -- positively misleading. Use variable names that are a better match for the variables' meaning, unless you're trying to confuse yourself and everybody else.

Given a sensibly named variable for the input text, e.g. text = open('thefile.txt'), and a decent way to determine whether a word is incorrect, say a function def iswrong(word):..., the way to code what you require becomes clear:

for i, line in enumerate(text):
    for word in allwords(line):
        if iswrong(word):
            print word, i

The allwords function could be just:

def allwords(line):
    return line.split()

if you have no punctuation (words just separated by whitespace), or

import re

def allwords(line):
    return re.findall(r'\w+', line)

using regular expressions.

If e.g. badwords is a set of incorrect words,

def iswrong(word):
    return word in badwords

or viceversa if goodwords is the set of all correct words,

def iswrong(word):
    return word not in goodwords

The details of iswrong and allwords are secondary -- as is the choice of whether to keep them as functions or just embed their code inline in the main stream of control.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜