Get final URL after curl is redirected
I need to get the final URL after a page redirect preferably with curl or wget.
For example http://google.com may redirect to http://www.goo开发者_StackOverflow社区gle.com.
The contents are easy to get(ex. curl --max-redirs 10 http://google.com -L
), but I'm only interested in the final url (in the former case http://www.google.com).
Is there any way of doing this by using only Linux built-in tools? (command line only)
curl
's -w
option and the sub variable url_effective
is what you are
looking for.
Something like
curl -Ls -o /dev/null -w %{url_effective} http://google.com
More info
-L Follow redirects -s Silent mode. Don't output anything -o FILE Write output to <file> instead of stdout -w FORMAT What to output after completion
More
You might want to add -I
(that is an uppercase i
) as well, which will make the command not download any "body", but it then also uses the HEAD method, which is not what the question included and risk changing what the server does. Sometimes servers don't respond well to HEAD even when they respond fine to GET.
Thanks, that helped me. I made some improvements and wrapped that in a helper script "finalurl":
#!/bin/bash
curl $1 -s -L -I -o /dev/null -w '%{url_effective}'
-o
output to/dev/null
-I
don't actually download, just discover the final URL-s
silent mode, no progressbars
This made it possible to call the command from other scripts like this:
echo `finalurl http://someurl/`
as another option:
$ curl -i http://google.com
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sat, 19 Jun 2010 04:15:10 GMT
Expires: Mon, 19 Jul 2010 04:15:10 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
But it doesn't go past the first one.
Thank you. I ended up implementing your suggestions: curl -i + grep
curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1
Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.
Could be buggy, but at a glance it works ok.
You can do this with wget usually. wget --content-disposition
"url" additionally if you add -O /dev/null
you will not be actually saving the file.
wget -O /dev/null --content-disposition example.com
The parameters -L (--location)
and -I (--head)
still doing unnecessary HEAD-request to the location-url.
If you are sure that you will have no more than one redirect, it is better to disable follow location and use a curl-variable %{redirect_url}.
This code do only one HEAD-request to the specified URL and takes redirect_url from location-header:
curl --head --silent --write-out "%{redirect_url}\n" --output /dev/null "https://""goo.gl/QeJeQ4"
Speed test
all_videos_link.txt
- 50 links of goo.gl+bit.ly which redirect to youtube
1. With follow location
time while read -r line; do
curl -kIsL -w "%{url_effective}\n" -o /dev/null $line
done < all_videos_link.txt
Results:
real 1m40.832s
user 0m9.266s
sys 0m15.375s
2. Without follow location
time while read -r line; do
curl -kIs -w "%{redirect_url}\n" -o /dev/null $line
done < all_videos_link.txt
Results:
real 0m51.037s
user 0m5.297s
sys 0m8.094s
curl
can only follow http redirects. To also follow meta refresh directives and javascript redirects, you need a full-blown browser like headless chrome:
#!/bin/bash
real_url () {
printf 'location.href\nquit\n' | \
chromium-browser --headless --disable-gpu --disable-software-rasterizer \
--disable-dev-shm-usage --no-sandbox --repl "$@" 2> /dev/null \
| tr -d '>>> ' | jq -r '.result.value'
}
If you don't have chrome installed, you can use it from a docker container:
#!/bin/bash
real_url () {
printf 'location.href\nquit\n' | \
docker run -i --rm --user "$(id -u "$USER")" --volume "$(pwd)":/usr/src/app \
zenika/alpine-chrome --no-sandbox --repl "$@" 2> /dev/null \
| tr -d '>>> ' | jq -r '.result.value'
}
Like so:
$ real_url http://dx.doi.org/10.1016/j.pgeola.2020.06.005
https://www.sciencedirect.com/science/article/abs/pii/S0016787820300638?via%3Dihub
This would work:
curl -I somesite.com | perl -n -e '/^Location: (.*)$/ && print "$1\n"'
I'm not sure how to do it with curl, but libwww-perl installs the GET alias.
$ GET -S -d -e http://google.com
GET http://google.com --> 301 Moved Permanently
GET http://www.google.com/ --> 302 Found
GET http://www.google.ca/ --> 200 OK
Cache-Control: private, max-age=0
Connection: close
Date: Sat, 19 Jun 2010 04:11:01 GMT
Server: gws
Content-Type: text/html; charset=ISO-8859-1
Expires: -1
Client-Date: Sat, 19 Jun 2010 04:11:01 GMT
Client-Peer: 74.125.155.105:80
Client-Response-Num: 1
Set-Cookie: PREF=ID=a1925ca9f8af11b9:TM=1276920661:LM=1276920661:S=ULFrHqOiFDDzDVFB; expires=Mon, 18-Jun-2012 04:11:01 GMT; path=/; domain=.google.ca
Title: Google
X-XSS-Protection: 1; mode=block
Can you try with it?
#!/bin/bash
LOCATION=`curl -I 'http://your-domain.com/url/redirect?r=something&a=values-VALUES_FILES&e=zip' | perl -n -e '/^Location: (.*)$/ && print "$1\n"'`
echo "$LOCATION"
Note: when you execute the command curl -I http://your-domain.com have to use single quotes in the command like curl -I 'http://your-domain.com'
You could use grep. doesn't wget tell you where it's redirecting too? Just grep that out.
精彩评论