开发者

C# search for strings in a website

I am trying to figure out if in C# if I have converted a webpage contents into a string, what is the best way to search for extensions. I am just looking to extract URLs within a webpage that ends in .html or .xhtml or edu. In which I don't care what the beginning looks like, which is better EndWith or Regex for finding this.

so if my input looked like this

string str = {var a,b=window.location.href.match(//webhp\?[^#]tune=[^#]/);if(a=b&&b.length>0?"http://www.google.com/logos/2011/lespaul.html"+b[

and i want to pull out http://www.google.com/logos/2011/lespaul.html store th开发者_JS百科at into an array


You should use an HTML parser such as sharp-query or HTML Agility Pack and never use regular expressions for parsing html or as the author of this post says some things might happen.


I could come up with this Regex: http:\/\/(.*?)(.html|.xhtml|.edu)
Edit Thanks to @Kakashi http:\/\/.*?\.(?:x?html|edu)


Try this:

var input = "string str = {var a,b=window.location.href.match(//webhp\\?[^#]tune=[^#]/);if(a=b&&b.length>0?\"http://www.google.com/logos/2011/lespaul.html";
var match =  Regex.Match(input, @"https?:\/{2}[^\n]+\.(?:x?html|edu)");
Console.Write(match.Success? match.Groups[0].Value : "Not found"); //http://www.google.com/logos/2011/lespaul.html  
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜