开发者

recursive nested loops

Example Scenario:开发者_开发问答 Note, this can be as deep or as shallow depending on the website. Spider scans the first page for links. it stores it as array1.

spider enters the first link, it's now on second page. it sees links, and stores it as array2.

spider enters the first link on the second page, its now on third page. it sees links upon, and stores it as array 3.

Please note that this is generic scenario. I want to highlight the need to do many loops within loops.

rootArray[array1,array2,array3....]

how can i do a recursive nested loops ? array2 is the children of each VALUE of array1 (we assume the structure is very uniform, each VALUE of array 1 has similiar links in array2). Array 3 is the children of each Value of array2. and so on.


module Scratch
  def self.recur(arr, depth, &fn)
    arr.each do |a| 
      a.is_a?(Array) ?  recur(a, depth+1, &fn) : fn.call(a, depth)
    end
  end
  arr = [[1, 2, 3], 4, 5, [6, 7, [8, 9]]]
  recur(arr, 0) { |x,d| puts "#{d}: #{x}" }
end


You'll want to store these results in a tree, not a collection of arrays. Page1 would have child nodes for each link. Each of those has child nodes for its links, etc. An alternate approach would be to just store all of the links in one array, recursing through the site to find the links in question. Do you really need them in a structure analogous to that of the site?

You'll also want to check for duplicate links when adding any new link to the list/tree/whatever that you've already got. Otherwise, loops like page_1 -> page_2 -> page_1... will break your app.

What's your real goal here? Page crawlers aren't exactly new technology.


It all depends on what you are trying to do.

If you are harvesting links then a hash or set will work well. An array can be used too but can lead to some gotchas.

If you need to show the structure of the site you'll want a tree or arrays of arrays along with some way of flagging which urls you've visited.

In any case you need to avoid redundant links to keep from getting into a loop. It's also real common to put some sort of limitation on how deep you'll descend and whether you'll remember and/or follow links outside of the site.


Gweg, I just answered this on your other post.

How do I create nested FOR loops with varying depths, for a varying number of arrays?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜