开发者

with python's win32com and parsing html problem

I'm new to python. I want to extract some text from the CNN website.

I want to use python win32com module.

EDIT: on [why win32com]

Because of javascript in website... I thought of using win32com; I have looked for other solution but without success in regard to my requirement. In fact, I wanted to use mechanize or a similiar solution but this didn't work [for me].

Is it possible to use beautifulsoup or lxml with win32com?

Anyone wh开发者_如何学Pythono knows how to extract some text from cnn webiste, please help me! Specifically I want to extract text in cnn website from 'Sponsored links' 'Money'

import win32com.client
from time import sleep
from win32com.client import Dispatch
import urllib,urllib2
from BeautifulSoup import BeautifulSoup

ie = Dispatch("InternetExplorer.Application")   
ie.Visible = 1   
ie.Navigate("http://www.cnn.com") 
sleep(15)
ie.Quit()


Are you trying to parse some text on cnn's web site?

You can get the page with

import urllib
f = urllib.urlopen('http://www.cnn.com')
page = f.read()
f.close()

You can then use BeautifulSoup to find whatever it is you are looking for on page.

Why win32com, dispatch, etc.?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜