Need help with a Python scraper
I am trying to use urllib with python to make a scraper, I can download the images, but they are a thumbnail, 250x250 or less.(I am trying of 4chan, Because I like some of the picture threads) How can I get the full image? here is my code
import urllib2, urllib
from BeautifulSoup import BeautifulSoup
import re
import urlparse
i = 0
ext = "'src' : re.compile(r'(jpe?g)|(png)|$'"
url = raw_input("Enter URL here:")
ender = raw_input("Enter File Type Here(For Images enter 'img'):")
if ender == "img":
ender = 'img', {'src' : re.compile(r'(.jpe?g)|(.png)|(.gif)$')}
else:
if "." in ender:
end = ender
else:
end = ".%s" % ender
raw = urllib.urlopen(url)
soup = BeautifulSoup(raw)
parse = list(urlpars开发者_如何学JAVAe.urlparse(url))
for ender in soup.findAll(ender):
links = "%(src)s"% ender
print links
str(links)
if ".jpg" in links:
end = ".jpg"
if ".jpeg" in links:
end = ".jpeg"
if ".gif" in links:
end = ".gif"
if ".png" in links:
end = ".png"
i += 1
urllib.urlretrieve(links, "%s%s" % (i, end))
Because you can click to see a larger link, the URL in the <a href="url">
that is around the image tag points to the full image.
So just read the value of the href
property, and download that instead of the src
property of the image.
精彩评论