Capturing part of a url
I'm having some difficulty writing a regular expression. My input will be a url, looking like this:
http://www.a.com/farms/important-stuff-here#ignorable-stuff
I wanted to capture (some-stuff-here), which is everything between the last forward slash, and the first # sign (or just the ending, if the # sign extra content does not exist. I thought this might do it:
(http://www.a.com/farms/)
([anything but a # character]*)
(.*)
I'm not sure how to express the 2nd group ([anything but a # character]*
).
Thanks
"Anything but" is called a negated character class, and, in your case, is spelled
[^#]
Your regex would be
http://www.a.com/farms/([^#]+)
For most re engines you probably want [^#]
(the ^
negates a character class).
depending on your language, you might want to use modules/libraries that can parse url nicely for you. eg in PHP, you can use parse_url
$url = "http://www.a.com/farms/important-stuff-here#ignorable-stuff";
$parsed = parse_url($url);
print $parsed['path'];
with Python, urlparse() eg:
>>> import urlparse
>>> s=""http://www.a.com/farms/important-stuff-here#ignorable-stuff"
>>> urlparse.urlparse(s).path
'/farms/important-stuff-here'
IF you really want to do it by hand, first replace everything from "#" onwards, then replace everything from the start till "/"
$ echo "http://www.a.com/farms/important-stuff-here#ignorable-stuff" | sed 's/#.*//;s|.*\/||'
important-stuff-here
Or using just plain splits on strings
$url = "http://www.a.com/farms/important-stuff-here#ignorable-stuff";
$s = explode("#",$url,2);
$t = explode("/",$s[0]);
print end($t);
精彩评论