开发者

wget: don't follow redirects

How do I prevent wget from following re开发者_运维百科directs?


--max-redirect 0

I haven't tried this, it will either allow none or allow infinite..


Use curl without -L instead of wget. Omitting that option when using curl prevents the redirect from being followed.

If you use curl -I <URL> then you'll get the headers instead of the redirect HTML.

If you use curl -IL <URL> then you'll get the headers for the URL, plus those for the URL you're redirected to.


Some versions of wget have a --max-redirect option: See here


wget follows up to 20 redirects by default. However, it does not span hosts. If you have asked wget to download example.com, it will not touch any resources at www.example.com. wget will detect this as a request to span to another host and decide against it.

In short, you should probably be executing:

wget --mirror www.example.com

Rather than

wget --mirror example.com

Now let's say the owner of www.example.com has several subdomains at example.com and we are interested in all of them. How to proceed?

Try this:

wget --mirror --domains=example.com example.com

wget will now visit all subdomains of example.com, including m.example.com and www.example.com.


In general, it is not a good idea to depend on a specific number of redirects.

For example, in order to download IntellijIdea, the URL that is promised to always resolve to the latest version of Community Edition for Linux is something like https://download.jetbrains.com/product?code=IIC&latest&distribution=linux, but if you visit that URL nowadays, you are going to be redirected twice (2 times) before you reach the actual downloadable file. In the future you might be redirected three times, or not at all.

The way to solve this problem is with the use of the HTTP HEAD verb. Here is how I solved it in the case of IntellijIdea:

# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"

# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"

# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"

# Optional: download the actual file.
wget "$URL"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜