开发者

Using Scrapy with Javascript and iFrames and alternatives [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines guidelines. It is not currently accepting answers.
开发者_StackOverflow社区

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 3 years ago.

Improve this question

I'm trying to use Scrapy to scrape the U.S. government regulations website (www.regulations.gov). It's got a ton of information on it, but it's a terrible website, that is chock-full of javascript and iframes. I tried to run some simple Scrapy spiders, but I can't parse anything out because everything loads through Javascript and iframes.

For instance, on the main search page, this block of code actually loads the results table:

<script type="text/javascript" src="Regs/Regs.nocache.js?REGS211-b3"></script>

<title>Regulations.gov</title>
<link rel="stylesheet" type="text/css" href="css/print.css" media="print" />
</head>

<body class="bodyLoading">
<!-- this is required for GWT history support -->
<iframe src="javascript:''" id="__gwt_historyFrame" tabIndex='-1' style="position:absolute;width:0;height:0;border:0"></iframe>
<!-- For printing window contents  -->
<iframe id="__printingFrame" style="width:0;height:0;border:0;" ></iframe>

And, individual results pages have the same problem. For instance, this page has the same source as above.

Can Scrapy handle this problem at all? Are there any alternatives that might be able to?


Alternatives : try

1) selenium

2) imacros

3) PhantomJS with CasperJS

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜