开发者

get url from string [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

Get a URL from a String

Hi, im trying to extract a url from a string using regexp. the string is something like: "lorem ipsum baby www.test.com lorem", "lorem ipsum http://www.test.com foo bar" or "lorem www.test.com" with no trailing whitespace.

using

MatchCollection ms = Regex.Matches(adress, @"(www.+|http.+)([\s]|$)");

returns the entire string. Could any regexp-guru help me out on this one?

开发者_开发百科Edit:

Solved it this way:

MatchCollection mc = Regex.Matches(adress, @"(www[^ \s]+|http[^ \s]+)([\s]|$)", RegexOptions.IgnoreCase);

adress = mc[0].Value;

WebBrowserTask task = new WebBrowserTask();

task.URL = adress;

task.Show();

Thank you all for your help! :)


I think we are missing the obvious here that there is actually nothing wrong with this code.

Perhaps the OP is not calling the match.value correctly.

string adress = "hello www.google.ca";
// Size the control to fill the form with a margin
MatchCollection ms = Regex.Matches(adress, @"(www.+|http.+)([\s]|$)");
string testMatch = ms[0].Value.ToString();

testMatch only contains "www.google.ca"

Isn't this your intention newa?


Try something like this:

string txt = "lorem ipsum baby http:\\\\www.google.com\/";
Regex regx = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", 
RegexOptions.IgnoreCase);
MatchCollection ms = regx.Matches(txt);


I think the problem is that the "." identifier matches anything, including those trailing spaces you want to end the capture at. If you change the ".+" to "[^ ]+", or make the first capture "nongreedy" by putting a "?:" just inside the opening parenthesis, you should get the answer you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜