开发者

Once again, how to get nested loops to work in python

Can someone help me with this nested loop? it has the same problem as Loops not working - Strings (Python) but now it is in a csv class that doesn't have a csv.readline() function.

import csv
import sys, re
import codecs
reload(sys)
sys.setdefaultencoding('utf-8')

reader = csv.reader(open("reference.txt"), delimiter = "\t")
reader2 = csv.reader(open("current.txt"), delimiter = "\t")

for line in reader:
    for line2 in reader2:
        if line[0] == line2[1]:
            print line2[0] + '\t' + line[0]
            print line[1]
        else:
            print line[0]
            print line[1]

The purpose of this code is to check the lines in a reference text (i.e. reader2) that coincide with the开发者_Python百科 current textfile (i.e. reader). And then print the serial number that is in the reference.txt

reference.txt looks like this (the space between the serial no. and sentence is a tab):

S00001LP    this is a nested problem
S00002LP    that cannot be solved
S00003LP    and it's pissing me off
S00004LP    badly

current.txt looks like this(the space between the 1st and 2nd sentence is a ):

this is a nested problem    wakaraa pii ney bay tam
and i really can't solve it    shuu ipp faa luiip
so i come to seek help from stackoverflow    lakjsd sdiiije
seriously it is crazy because such    foo bar bar foo
problems don't happen in other languages    whaloemver ahjd
and it's pissing me off    gaga ooo mama
badly    wahahahah

the required output will look something like this:

S00001LP    this is a nested problem    wakaraa pii ney bay tam
and i really can't solve it    shuu ipp faa luiip
so i come to seek help from stackoverflow    lakjsd sdiiije
seriously it is crazy because such    foo bar bar foo
problems don't happen in other languages    whaloemver ahjd
S00003LP    and it's pissing me off    gaga ooo mama
S00004LP    badly    wahahahah


You can only read from a stream once. Your inner loop is consuming the second file too quickly, and other iterations of your outer loop don't have a chance to read the second file again.

Try changing this:

reader = csv.reader(open("reference.txt"), delimiter = "\t")
reader2 = csv.reader(open("current.txt"), delimiter = "\t")

to this:

reader = list(csv.reader(open("reference.txt"), delimiter = "\t"))
reader2 = list(csv.reader(open("current.txt"), delimiter = "\t"))

The list() will read the file in its entirety, creating an in-memory list from it, which you can then iterate as many times as your like.

A better solution would be to store your reference data in a dictionary so that you don't have to loop over it for every line in your data.


One approach is to create a dictionary mapping your keys to serial numbers:

serials = dict(map(reversed, reader))
for line in reader2:
    serial = serials.get(line[0])
    if serial is not None:
        print serial

This will be much faster than a nested loop.

The first line creates the dictionary mapping keys to serial numbers. Since the dictionary constructor expects an iterables of (key, value) pairs while your file actually contains (value, key) pairs, we have to swap the two entries in each record -- this is what map(reversed, ...) does.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜