wget on links without extensions
I'm testing wget on one of my sites, it's structured like this:
<a href="/stuff/fancy-stub-url">Fancy Stub</a>
<a href="/stuff/more-fancy-seo-link">Seo Link</a>
<a href="/stuff/somethingIdontwant/#blah">Don't Download me</开发者_如何转开发a>
Inside each of those links, there's a .png I want.
wget http://example.com/landing-page \
--recursive \
--level=2 \
--accept '[a-zA-Z-]+',*.png \
--force-html \
--base=http://example.com
The reason I thought I needed --level=2
with --recursive
is because the /more-fancy-seo-link
had the .png files, so I would need to hit them and then hit the .png files contained within. This is wrong, because the /more-fancy-seo-link
pages are downloaded and not followed because they don't have the extension. How do I get wget to follow my SEO links, and then download the .png files in them?
--force-html and --base only work with the -i option.
Your '*.png' is not quoted from the shell, so will be being substituted. You could try quoting it.
wget http://example.com/landing-page \
--recursive \
--level=2 \
--accept '[a-zA-Z-]+,*.png'
If this fails, you could try:
wget http://example.com/landing-page -O - | \
wget -i - \
--recursive \
--level=2 \
--accept '*.png' \
--force-html \
--base=http://example.com
This gets the HTML file and pipes it to a second wget instance to get the PNGs.
精彩评论