Capturing part of a url

2022-12-20 07:55 问答作者：

I'm having some difficulty writing a regular expression. My input will be a url, looking like this:

http://www.a.com/farms/important-stuff-here#ignorable-stuff

I wanted to capture (some-stuff-here), which is everything between the last forward slash, and the first # sign (or just the ending, if the # sign extra content does not exist. I thought this might do it:

(http://www.a.com/farms/)

([anything but a # character]*)

(.*)

I'm not sure how to express the 2nd group ([anything but a # character]*).

开发者_JS百科

Thanks

"Anything but" is called a negated character class, and, in your case, is spelled

[^#]

Your regex would be

http://www.a.com/farms/([^#]+)

For most re engines you probably want [^#] (the ^ negates a character class).

depending on your language, you might want to use modules/libraries that can parse url nicely for you. eg in PHP, you can use parse_url

$url = "http://www.a.com/farms/important-stuff-here#ignorable-stuff";
$parsed = parse_url($url);
print $parsed['path'];

with Python, urlparse() eg:

>>> import urlparse
>>> s=""http://www.a.com/farms/important-stuff-here#ignorable-stuff"
>>> urlparse.urlparse(s).path
'/farms/important-stuff-here'

IF you really want to do it by hand, first replace everything from "#" onwards, then replace everything from the start till "/"

$ echo "http://www.a.com/farms/important-stuff-here#ignorable-stuff" | sed 's/#.*//;s|.*\/||'
important-stuff-here

Or using just plain splits on strings

$url = "http://www.a.com/farms/important-stuff-here#ignorable-stuff";
$s = explode("#",$url,2);
$t = explode("/",$s[0]);
print end($t);

继续阅读：regex

Capturing part of a url

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？