Asp.net Crawler Webresponse Operation Timed out

2022-12-31 15:08 问答作者：

Hi I have built a simple threadpool based web crawler within my web application.开发者_JS百科 Its job is to crawl its own application space and build a Lucene index of every valid web page and their meta content. Here's the problem. When I run the crawler from a debug server instance of Visual Studio Express, and provide the starting instance as the IIS url, it works fine. However, when I do not provide the IIS instance and it takes its own url to start the crawl process(ie. crawling its own domain space), I get hit by operation timed out exception on the Webresponse statement. Could someone please guide me into what I should or should not be doing here? Here is my code for fetching the page. It is executed in the multithreaded environment.

private static string GetWebText(string url)
    {
        string htmlText = "";        

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
        request.UserAgent = "My Crawler";

        using (WebResponse response = request.GetResponse())
        {
            using (Stream stream = response.GetResponseStream())
            {
                using (StreamReader reader = new StreamReader(stream))
                {
                    htmlText = reader.ReadToEnd();
                }
            }
        }
        return htmlText;
    }

And the following is my stacktrace:

at System.Net.HttpWebRequest.GetResponse()
   at CSharpCrawler.Crawler.GetWebText(String url) in c:\myAppDev\myApp\site\App_Code\CrawlerLibs\Crawler.cs:line 366
   at CSharpCrawler.Crawler.CrawlPage(String url, List`1 threadCityList) in c:\myAppDev\myApp\site\App_Code\CrawlerLibs\Crawler.cs:line 105
   at CSharpCrawler.Crawler.CrawlSiteBuildIndex(String hostUrl, String urlToBeginSearchFrom, List`1 threadCityList) in c:\myAppDev\myApp\site\App_Code\CrawlerLibs\Crawler.cs:line 89
   at crawler_Default.threadedCrawlSiteBuildIndex(Object threadedCrawlerObj) in c:\myAppDev\myApp\site\crawler\Default.aspx.cs:line 108
   at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state)
   at System.Threading.ExecutionContext.runTryCode(Object userData)
   at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Thanks and cheers, Leon.

How many concurrent requests are being made by your crawler? You could easily be starving the threadpool - particularly as the crawler is running within the website code.

Each Request that your calling like this will use 2 threads from the pool - one to process the request and another to wait for the response.

继续阅读：asp.net httpwebresponse web-crawler

Asp.net Crawler Webresponse Operation Timed out

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？