Extract text from a file object using .read()
I'm trying to read the source of a website with this code:
import urllib2
z=urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search')
z.read()
print z
txt = open('music.txt','w')
txt.write(str(z))
txt.close()
for i in open('music.txt','r'):
if '''onclick="javascript:pageTracker._track开发者_StackOverflow社区Pageview('/clicks/''' in i:
print i
And all I get for the source code is:
<addinfourl at 51561608L whose fp = <socket._fileobject object at 0x0000000002CCA480>>
It might be an error I don't know?
Does anyone know of a better way to do the job above without putting it into a text file first?z
is a file object. In fact your codes prints the object description. You need to put the result of z.read()
inside a variable (or print it directly).
You should do
import urllib2
z=urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search')
i = z.read()
print i
.read()
does not actually change the state of z
. Use z=z.read()
instead.
z
is the file-like object. str(z)
just gives you the representation you're seeing.
You need to keep the string (the contents of the file) that's returned by z.read()
.
Better yet, just iterate over it directly:
import urllib2
z=urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search')
for i in z:
if '''onclick="javascript:pageTracker._trackPageview('/clicks/''' in i:
print i
I think you're missing what read
does. Try:
data = z.read()
print data
with open('music.txt','w') as txt:
txt.write(data)
with open('music.txt','w') as out:
out.write(urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search').read()
But this is just the html for the page, you will need to parse it yourself using beautiful soup or lxml
精彩评论