with python's win32com and parsing html problem
I'm new to python. I want to extract some text from the CNN website.
I want to use python win32com module. EDIT: on [why win32com] Because of javascript in website... I thought of using win32com; I have looked for other solution but without success in regard to my requirement. In fact, I wanted to use mechanize or a similiar solution but this didn't work [for me].Is it possible to use beautifulsoup or lxml with win32com?
Anyone wh开发者_如何学Pythono knows how to extract some text from cnn webiste, please help me! Specifically I want to extract text in cnn website from 'Sponsored links' 'Money'import win32com.client
from time import sleep
from win32com.client import Dispatch
import urllib,urllib2
from BeautifulSoup import BeautifulSoup
ie = Dispatch("InternetExplorer.Application")
ie.Visible = 1
ie.Navigate("http://www.cnn.com")
sleep(15)
ie.Quit()
Are you trying to parse some text on cnn's web site?
You can get the page with
import urllib
f = urllib.urlopen('http://www.cnn.com')
page = f.read()
f.close()
You can then use BeautifulSoup to find whatever it is you are looking for on page.
Why win32com, dispatch, etc.?
精彩评论