Very Simple Regex Question

2023-01-02 12:53 问答作者：

I have a very simple regex question. Suppose I have 2 conditions:

url =http://www.abc.com/cde/def
url =https://www.abc.com/开发者_C百科sadfl/dsaf

How can I extract the baseUrl using regex?

Sample output:

http://www.abc.com
https://www.abc.com

Like this:

String baseUrl;
Pattern p = Pattern.compile("^(([a-zA-Z]+://)?[a-zA-Z0-9.-]+\\.[a-zA-Z]+(:\d+)?/");
Matcher m = p.matcher(str); 
if (m.matches())
    baseUrl = m.group(1);

However, you should use the URI class instead, like this:

URI uri = new URI(str);

A one liner without regexp:

String baseUrl = url.substring(0, url.indexOf('/', url.indexOf("//")+2));

/^(https?\:\/\/[^\/]+).*/$1/

This will capture ANYTHING that starts with http and $1 will contain everything from the beginning to the first / after the //

Except for write-and-throw-away scripts, you should always refrain from parsing complex syntaxes (e-mail addresses, urls, html pages, etc etc) using regexes.

believe me, you will get bitten eventually.

I'm pretty sure that there is a Java class that will allow path manipulations, but if it has to be a regex,

https?://[^/]+

would work. (s? included to also handle https:)

Looks like the simplest solution to your two specific examples would be the pattern:

[^/]_//[^/]+

i.e.: non-slash (0 or more times), two slashes, non-slash (0 or more times). You can be stricter than that if you wish, as the two existing answers are doing in different ways -- one will reject e.g. URLs starting with ftp:, the other will reject domains with underscores (but accept URLs without a leading protocol://, thereby being even broader than mine in that respect). This variety of answers (all correct wrt your scant specs;-) should suggest to you that your specs are too vague and should be tightened.

Here's a regex that should satisfy the problem as given.

https?://[^/]*

I'm assuming you're asking this partly to gain more knowledge of regexes. If, however, you're trying to pull the host from a URL, it's arguably much more correct to use Java's more robust parsing methods:

String urlStr = "https://www.abc.com/stuff";
URL url = new URL(urlStr);
String host = url.getHost();
String protocol = url.getProtocol();
URL baseUrl = new URL (protocol, host);

This is better, as it should catch more cases if your input URL isn't as strict as described above.

Old post.. thought I might as well put a simple answer to a simple regex Q:

(http|https):\/\/(www.)?(\w+)?\.(\w+)?

继续阅读：regex

Very Simple Regex Question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？