开发者

how do I advance to the next item in a nested list? Python

Working with a couple of lists, iterating over each. Here's a code segment:

self.links = []
self.iter=iter(self.links)
for tgt in self.links:
    for link in self.mal_list:
        print(link)
        if tgt == link:
           print("Found Suspicious Link: {0}".format(tgt))
           self.count += 1

        else:
           self.count += 1
           self.crawl(self.iter.next())

Its advancing to the next item in the link list, just fine. For the malware signature list I tried using a similar iter item, but I'm not entirely sure if thats even the best way, and if so were to place it in my code so that each link that is urlopened from the list is compared 开发者_如何学运维to every item in the malware list BEFORE the loop opens up the next item in the link list. Any suggestions?


Not sure, what you are trying to ask but you could simplify your code. Though this is not necessary.

self.links = []
self.non_malware_link = [link for link in self.links if link not in self.mal_list]
results = map(self.crawl, self.non_malware_link)

On some issues with your code:

  1. self.count is exactly the same as len(self.links)

Apart from meaning of self.count, every thing else looks like it does what it needs to do.


The essential way that you are doing it is fine, but it will be slow.

Try this instead:

 for tgt in links:
      if tgt in mal_links:
          # you know that it's a bad link
      else:
          crawl(tgt)

I don't see why you are keeping two iterators going over the list. This will introduce a bug because you don't call next on self.iter in the case that you detect a malware link. The next time tgt isn't a bad link, when you call next, it will advance to the previously detected bad link and you will crawl that. Is there some reason that you feel the need to step over two copies of the iterator instead of just one?

Also, your initial code will crawl page once for every time it is not determined to be equal to a given malware link. This might lead to some angry web masters depending on how big your list is.


Searching an item inside a list is slow, if this is what you're trying to do, then use a dict or a set instead of list for the self.mal_list:

mal_list = set(self.mal_list)
for tgt in self.links:
    if tgt in mal_list: 
        print("Found Suspicious Link: {0}".format(tgt))
        self.count += 1
    else:
        self.count += 1
        self.crawl(self.iter.next())

or, if you can have self.links as set as well:

mal_list = set(self.mal_list)
links = set(self.links)
detected = links.intersection(mal_list)
for malware in detected:
    print("Found Suspicious Link: {0}".format(tgt))
    self.count += 1
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜