wget: don't follow redirects

2022-12-27 02:27 问答作者：

How do I prevent wget from following re开发者_运维百科directs?

--max-redirect 0

I haven't tried this, it will either allow none or allow infinite..

Use curl without -L instead of wget. Omitting that option when using curl prevents the redirect from being followed.

If you use curl -I <URL> then you'll get the headers instead of the redirect HTML.

If you use curl -IL <URL> then you'll get the headers for the URL, plus those for the URL you're redirected to.

Some versions of wget have a --max-redirect option: See here

wget follows up to 20 redirects by default. However, it does not span hosts. If you have asked wget to download example.com, it will not touch any resources at www.example.com. wget will detect this as a request to span to another host and decide against it.

In short, you should probably be executing:

wget --mirror www.example.com

Rather than

wget --mirror example.com

Now let's say the owner of www.example.com has several subdomains at example.com and we are interested in all of them. How to proceed?

Try this:

wget --mirror --domains=example.com example.com

wget will now visit all subdomains of example.com, including m.example.com and www.example.com.

In general, it is not a good idea to depend on a specific number of redirects.

For example, in order to download IntellijIdea, the URL that is promised to always resolve to the latest version of Community Edition for Linux is something like https://download.jetbrains.com/product?code=IIC&latest&distribution=linux, but if you visit that URL nowadays, you are going to be redirected twice (2 times) before you reach the actual downloadable file. In the future you might be redirected three times, or not at all.

The way to solve this problem is with the use of the HTTP HEAD verb. Here is how I solved it in the case of IntellijIdea:

# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"

# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"

# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"

# Optional: download the actual file.
wget "$URL"

继续阅读：bash http redirect wget

wget: don't follow redirects

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？