开发者

wget on links without extensions

I'm testing wget on one of my sites, it's structured like this:

<a href="/stuff/fancy-stub-url">Fancy Stub</a>
<a href="/stuff/more-fancy-seo-link">Seo Link</a>
<a href="/stuff/somethingIdontwant/#blah">Don't Download me</开发者_如何转开发a>

Inside each of those links, there's a .png I want.

wget http://example.com/landing-page \
    --recursive \
    --level=2 \
    --accept '[a-zA-Z-]+',*.png \
    --force-html \
    --base=http://example.com

The reason I thought I needed --level=2 with --recursive is because the /more-fancy-seo-link had the .png files, so I would need to hit them and then hit the .png files contained within. This is wrong, because the /more-fancy-seo-link pages are downloaded and not followed because they don't have the extension. How do I get wget to follow my SEO links, and then download the .png files in them?


--force-html and --base only work with the -i option.

Your '*.png' is not quoted from the shell, so will be being substituted. You could try quoting it.

wget http://example.com/landing-page \
    --recursive \
    --level=2 \
    --accept '[a-zA-Z-]+,*.png'

If this fails, you could try:

wget http://example.com/landing-page -O - | \
    wget -i - \
        --recursive \
        --level=2 \
        --accept '*.png' \
        --force-html \
        --base=http://example.com

This gets the HTML file and pipes it to a second wget instance to get the PNGs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜