Reading only HTML Content from a Web site page

2023-03-11 07:03 问答作者：

I'm using C#, and I'd like to scrape all the content on a site (but not the images, scripts, or files that may be attached to th开发者_如何学编程e page). How do I do that with C# and ASP.NET?

Hi you can use the following code snippet from HERE to do that:

StringBuilder sb  = new StringBuilder();
byte[]        buf = new byte[8192];

HttpWebRequest  request  = (HttpWebRequest)WebRequest.Create("http://www.your-url.com");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Stream resStream = response.GetResponseStream();

string tempString = null;
int    count      = 0;
do
{
    count = resStream.Read(buf, 0, buf.Length);

    if (count != 0)
    {
        tempString = Encoding.ASCII.GetString(buf, 0, count);
        sb.Append(tempString);
    }
}
while (count > 0);

Console.WriteLine(sb.ToString());

You can also get the HTML at Render method of the Page as following.

protected override void Render(System.Web.UI.HtmlTextWriter writer)
        {

            StringBuilder sb = new StringBuilder();
            StringWriter sw = new StringWriter(sb);

            HtmlTextWriter writer = new HtmlTextWriter(sw);
            base.Render(writer);
            string markupText = sb.ToString();
            // markupText will contain the HTML of the Page
            writer.Write(markupText);
        }

继续阅读：asp.net

Reading only HTML Content from a Web site page

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？