开发者

why is urllib2 missing table fields which I can see in the Firefox source?

the html that I am receiving from urllib2 is missing dozens of fields of data that I can see when I view the source of the URL in Firefox. Any advice would be much appreciated. Here is what it l开发者_JAVA百科ooks like:

from FireFox view source:

# ...<td class=td6>as</td></tr></thead>|ManyFields|<br></div><div id="c1">...

from urllib2 return html:

# ...<td class=td6>as</td></tr></thead>|</table>|<br></div><div id="c1">...


It seems from a cursory check that the page you're getting has a lot of Javascript; perhaps that Javascript cooperates in building the information that you see at the end in Firefox (at least some of it is actively altering the page's contents). If you need to scrape JS-rich pages, your best bet is to automate an actual browser via Selenium.


The extra content you're seeing is generated by JavaScript. It is not part of the raw HTML document, and hence won't be present with a plain HTTP fetcher such as urllib2.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜