开发者

Inbuilt Regex class or Parser.How to extract text between the tags from html file?

I have html file in which there is table content and other information in my c#.net application.

I want to parse the table contents for only some columns.Then should I use parser of html or Replace method of Regex in .net ?

And if I use the parser then how to use parser? Will parser extract the inforamation which is between the tags? If yes then how to use ? If possible show the example because I am new to parser.

If I use Replace method of Regex class then in that method how to pass the file name for which I want to extract 开发者_如何学Gothe information ?

Edit : I want to extract information from the table in html file. For that how can I use html agility parser ? What type of code I should write to use that parser ?


You just asked an almost identical question and deleted it. Here was the answer I gave before:


Try the HTML Agility Pack.

Here's an example:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Regarding your extra question regarding regex: do not use Regex to parse HTML. It is not a robust solution. The above library can do a much better job.


HtmlAgilityPack....

Next time - search for an answer before. This is duplicate for sure.

Little tutorial.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜