开发者

How to read a line of HTML using C#

I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. This code basically opens the html file and tries to parse line by line in search of the specified string. Even when just trying to print the first line of text in the HTML file nothign is displayed.

using (StreamReader sr = new Strea开发者_JS百科mReader("\\\\server\\myFile.html"))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                if(line == ("<td><strong>String I wantstrong></td>"))
                {
                    Label1.Text = "Text Found";
                    break;
                }
            }
        }

I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file.

Thanks.


The best way by far is the use the HTML Agility Pack

More about this can be found on a previous Stack overflow Question

Looking for C# HTML parser


You don't need to invent the wheel. Much better way to parse HTML is to use HTML parsers:

http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx

Also similar question is here What is the best way to parse html in C#?

Hope it helps.


If you know this HTML you are parsing is of XHTML why not parse this HTML as XML using System.XML ?


Your outer loop that reads line works fine. My guess is one of the following is taken place:

  • The HTML file is empty
  • The first line in the HTML file is empty

In either case, you won't see anything printed.

Now, to your loop:

You likely don't see what you expect, because

 if(line == ("<td><strong>String I wantstrong></td>"))
 {
    Label1.Text = "Text Found";
    break;
 }

Looks for an EXACT match. If this is your actual code, you're missing the open bracket </ on </strong> and you're likely forgetting that there is white space (indentation) in your HTML content.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜