Get final URL after curl is redirected

2023-01-03 20:23 问答作者：

I need to get the final URL after a page redirect preferably with curl or wget.

For example http://google.com may redirect to http://www.goo开发者_StackOverflow社区gle.com.

The contents are easy to get(ex. curl --max-redirs 10 http://google.com -L), but I'm only interested in the final url (in the former case http://www.google.com).

Is there any way of doing this by using only Linux built-in tools? (command line only)

curl's -w option and the sub variable url_effective is what you are looking for.

Something like

curl -Ls -o /dev/null -w %{url_effective} http://google.com

More info

-L         Follow redirects
-s         Silent mode. Don't output anything
-o FILE    Write output to <file> instead of stdout
-w FORMAT  What to output after completion

More

You might want to add -I (that is an uppercase i) as well, which will make the command not download any "body", but it then also uses the HEAD method, which is not what the question included and risk changing what the server does. Sometimes servers don't respond well to HEAD even when they respond fine to GET.

Thanks, that helped me. I made some improvements and wrapped that in a helper script "finalurl":

#!/bin/bash
curl $1 -s -L -I -o /dev/null -w '%{url_effective}'

-o output to /dev/null
-I don't actually download, just discover the final URL
-s silent mode, no progressbars

This made it possible to call the command from other scripts like this:

echo `finalurl http://someurl/`

as another option:

$ curl -i http://google.com
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sat, 19 Jun 2010 04:15:10 GMT
Expires: Mon, 19 Jul 2010 04:15:10 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

But it doesn't go past the first one.

Thank you. I ended up implementing your suggestions: curl -i + grep

curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1

Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.

Could be buggy, but at a glance it works ok.

You can do this with wget usually. wget --content-disposition "url" additionally if you add -O /dev/null you will not be actually saving the file.

wget -O /dev/null --content-disposition example.com

The parameters -L (--location) and -I (--head) still doing unnecessary HEAD-request to the location-url.

If you are sure that you will have no more than one redirect, it is better to disable follow location and use a curl-variable %{redirect_url}.

This code do only one HEAD-request to the specified URL and takes redirect_url from location-header:

curl --head --silent --write-out "%{redirect_url}\n" --output /dev/null "https://""goo.gl/QeJeQ4"

Speed test

all_videos_link.txt - 50 links of goo.gl+bit.ly which redirect to youtube

1. With follow location

time while read -r line; do
    curl -kIsL -w "%{url_effective}\n" -o /dev/null  $line
done < all_videos_link.txt

Results:

real    1m40.832s
user    0m9.266s
sys     0m15.375s

2. Without follow location

time while read -r line; do
    curl -kIs -w "%{redirect_url}\n" -o /dev/null  $line
done < all_videos_link.txt

Results:

real    0m51.037s
user    0m5.297s
sys     0m8.094s

curl can only follow http redirects. To also follow meta refresh directives and javascript redirects, you need a full-blown browser like headless chrome:

#!/bin/bash
real_url () {
    printf 'location.href\nquit\n' | \
    chromium-browser --headless --disable-gpu --disable-software-rasterizer \
    --disable-dev-shm-usage --no-sandbox --repl "$@" 2> /dev/null \
    | tr -d '>>> ' | jq -r '.result.value'
}

If you don't have chrome installed, you can use it from a docker container:

#!/bin/bash
real_url () {
    printf 'location.href\nquit\n' | \
    docker run -i --rm --user "$(id -u "$USER")" --volume "$(pwd)":/usr/src/app \
    zenika/alpine-chrome --no-sandbox --repl "$@" 2> /dev/null \
    | tr -d '>>> ' | jq -r '.result.value'
}

Like so:

$ real_url http://dx.doi.org/10.1016/j.pgeola.2020.06.005 
https://www.sciencedirect.com/science/article/abs/pii/S0016787820300638?via%3Dihub

This would work:

 curl -I somesite.com | perl -n -e '/^Location: (.*)$/ && print "$1\n"'

I'm not sure how to do it with curl, but libwww-perl installs the GET alias.

$ GET -S -d -e http://google.com
GET http://google.com --> 301 Moved Permanently
GET http://www.google.com/ --> 302 Found
GET http://www.google.ca/ --> 200 OK
Cache-Control: private, max-age=0
Connection: close
Date: Sat, 19 Jun 2010 04:11:01 GMT
Server: gws
Content-Type: text/html; charset=ISO-8859-1
Expires: -1
Client-Date: Sat, 19 Jun 2010 04:11:01 GMT
Client-Peer: 74.125.155.105:80
Client-Response-Num: 1
Set-Cookie: PREF=ID=a1925ca9f8af11b9:TM=1276920661:LM=1276920661:S=ULFrHqOiFDDzDVFB; expires=Mon, 18-Jun-2012 04:11:01 GMT; path=/; domain=.google.ca
Title: Google
X-XSS-Protection: 1; mode=block

Can you try with it?

#!/bin/bash 
LOCATION=`curl -I 'http://your-domain.com/url/redirect?r=something&a=values-VALUES_FILES&e=zip' | perl -n -e '/^Location: (.*)$/ && print "$1\n"'` 
echo "$LOCATION"

Note: when you execute the command curl -I http://your-domain.com have to use single quotes in the command like curl -I 'http://your-domain.com'

You could use grep. doesn't wget tell you where it's redirecting too? Just grep that out.

继续阅读：curl redirect wget

Get final URL after curl is redirected

Speed test

1. With follow location

2. Without follow location

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Speed test

1. With follow location

2. Without follow location

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？