Remove all HTML tags and do a carriage return on <BR> in C#

2023-04-09 04:14 问答作者：

I am creating a HTML to text parser. I need to remove all HTML elements and want to do a carriage return everytime there is a <BR> and then remove the <BR> as well after so there are no HTML tags left. I then want to parse the text for a certain string that is in the combobox. Thank you in advance for your help.

private void navigateWeb_Click(object sender, EventArgs e)
    {

        openFD.Title = "Select your configuration file";
        openFD.InitialDirectory = "C:";
        openFD.FileName = "";
        openFD.Filter = "Config File (*.cfg)|*.cfg|Text File (*.txt)|*.txt|All Files (*.*)|*.*";
        openFD.ShowDialog();
        MyURL = openFD.FileName;
        //Open and read file
        System.IO.StreamReader objReader;
        objReader = new System.IO.StreamReader(MyURL);
        richTextBox1.Text = objReader.ReadToEnd();

        var lines = File.ReadAllLines(MyURL)
            .Select(l => l.Trim())
            .Where(l => l.StartsWith(comboBox1.Text));
        textBox1.Text = String.Join(Environment.NewLine, lines);


    }

*********UPDATE***** Here is the solution that got the job done:

 public static string RemoveHTML(string text)
    {
        text = text.Replace("&nbsp;", " ").Replace("<br>", "\n");
        var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
        return oRegEx.Replace(text, string.Empty);

    } 
private void navigateWeb_Click(object sender, EventArgs e)
{

    openFD.Title = "Enter URL in the box below";
    openFD.InitialDirectory = "C:";
    openFD.FileName = "http://msnconf/configtc.aspx?IP=10.6.64.200&m=c";
    openFD.Filter = "HTTP://|*.*|Config File (*.cfg)|*.cfg|Text File (*.txt)|*.txt|All Files (*.*)|*.*";



    //openFD.ShowDialog();
    if (openFD.ShowDialog() == DialogResult.Cancel)
    {
        //MessageBox.Show("cancel button clicked");
    }
    else
    {
        MyURL = openFD.FileName;

        webBrowser1.Visible = true;
        richTextBox1.Visible = false;
        permitACL.Enabled = true;


        //webBrowser1.Navigate(new Uri(MyURL.SelectedItem.ToString()));
        webBrowser1.Navigate(MyURL);
        //Open and read file
        System.IO.StreamReader objReader;
        objReader = new System.IO.StreamReader(MyURL);
        richTextBox1.Text = objReader.ReadToEnd();




        //Read all lines of file
        //            String lines = objReader.ReadToEnd();
        String[] crString = { "<BR>&nbsp;" };
        String[] aLines = richTextBox1.Text.Split(crString, StringSplitOptions.RemoveEmptyEntries);
        //            String[] lines = File.ReadAllLines(MyURL);
        String noHtml = String.Empty;

        for (int x = 0; x < aLines.Length; x++)
        {
            if(permitACL.Checked)
            {

                if (aLines[x].Contains("permit"))
            {
                noHtml += (RemoveHTML(aLines[x]) + "\r\n");
            }

            }

            if (aLines[x].Contains(comboBox1.Text))
            {
                noHtml += (RemoveHTML(aLines[x]) + "\r\n");
            }
        }

        //Find lines that match our text in the combobo开发者_如何学编程x
        //lines.Select(l => l.Trim());
        //.Where(l => l.StartsWith(comboBox1.Text));



        //Print results to textbox
        textBox1.Text = String.Join(Environment.NewLine, noHtml);
    }

}

I suggest you use the HTML Agility Pack - it is an HTML parser that you can query with using XPath syntax.

public static string RemoveHTML(string text)
{
        text = text.Replace("&nbsp;", " ").Replace("<br>", "\n");
        var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
        return oRegEx.Replace(text, string.Empty);
}

继续阅读：parsing

Remove all HTML tags and do a carriage return on <BR> in C#

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？