Reading data from a website using C#

2023-02-06 03:45 问答作者：

I have a webpage which has nothing on it except some string(s). No images, no background color or anything, just some plain text which is not really that long in length.

I am just wondering, what is the best (by that, I mean fastest and most efficient) way to pass the string in the webpage so that I can use it for something else (e.g. display in a text box)? I know of WebClient, but I'm not sure if it'll do what I want it do and plus I don't want to even try that out even if it did work because the last time I did it took approximately 开发者_运维知识库30 seconds for a simple operation.

Any ideas would be appreciated.

The WebClient class should be more than capable of handling the functionality you describe, for example:

System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData("http://www.yoursite.com/resource/file.htm");

string webData = System.Text.Encoding.UTF8.GetString(raw);

or (further to suggestion from Fredrick in comments)

System.Net.WebClient wc = new System.Net.WebClient();
string webData = wc.DownloadString("http://www.yoursite.com/resource/file.htm");

When you say it took 30 seconds, can you expand on that a little more? There are many reasons as to why that could have happened. Slow servers, internet connections, dodgy implementation etc etc.

You could go a level lower and implement something like this:

HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://www.yoursite.com/resource/file.htm");

using (StreamWriter streamWriter = new StreamWriter(webRequest.GetRequestStream(), Encoding.UTF8))
{
    streamWriter.Write(requestData);
}

string responseData = string.Empty;
HttpWebResponse httpResponse = (HttpWebResponse)webRequest.GetResponse();
using (StreamReader responseReader = new StreamReader(httpResponse.GetResponseStream()))
{
    responseData = responseReader.ReadToEnd();
}

However, at the end of the day the WebClient class wraps up this functionality for you. So I would suggest that you use WebClient and investigate the causes of the 30 second delay.

If you're downloading text then I'd recommend using the WebClient and get a streamreader to the text:

        WebClient web = new WebClient();
        System.IO.Stream stream = web.OpenRead("http://www.yoursite.com/resource.txt");
        using (System.IO.StreamReader reader = new System.IO.StreamReader(stream))
        {
            String text = reader.ReadToEnd();
        }

If this is taking a long time then it is probably a network issue or a problem on the web server. Try opening the resource in a browser and see how long that takes. If the webpage is very large, you may want to look at streaming it in chunks rather than reading all the way to the end as in that example. Look at http://msdn.microsoft.com/en-us/library/system.io.stream.read.aspx to see how to read from a stream.

Regarding the suggestion So I would suggest that you use WebClient and investigate the causes of the 30 second delay.

From the answers for the question System.Net.WebClient unreasonably slow

Try setting Proxy = null;

WebClient wc = new WebClient(); wc.Proxy = null;

Credit to Alex Burtsev

If you use the WebClient to read the contents of the page, it will include HTML tags.

string webURL = "https://yoursite.com";
WebClient wc = new WebClient();
wc.Headers.Add("user-agent", "Only a Header!");
byte[] rawByteArray = wc.DownloadData(webURL);
string webContent = Encoding.UTF8.GetString(rawByteArray);

After getting the content, the html tags should be removed. Regex can be used for this:

var result= Regex.Replace(webContent, "<.*?>", String.Empty);

But this method is not very accurate, the better way is to install HtmlAgilityPack and use the following code:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webData);
string result = doc.DocumentNode.InnerText;

You say it takes 30 seconds, It has nothing to do with using WebClient (The main factor is internet connections or proxy). WebClient has worked very well for me. example

 WebClient client = new WebClient();
            using (Stream data = client.OpenRead(Text))
            {
                using (StreamReader reader = new StreamReader(data))
                {
                    string content = reader.ReadToEnd();
                    string pattern = @"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)";
                    MatchCollection matches = Regex.Matches(content,pattern);
                    List<string> urls = new List<string>();
                    foreach (Match match in matches)
                    {
                            urls.Add(match.Value);
                    }

              }

XmlDocument document = new XmlDocument();
document.Load("www.yourwebsite.com");
string allText = document.InnerText;

继续阅读：webpage

Reading data from a website using C#

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？