C# AJAX or Java response HTML scraping

2023-03-11 02:41 问答作者：

Is there a way in C# to get the output of AJAX or Java? What I'm trying to do is grab the specifics of items on a webpage, however the webpage does not loa开发者_如何学编程d it into the original source. Does anybody have a good tutorial or a good place to start?

For example, I would want to get all the car listings from http://www.madisonhonda.com/Preowned-Inventory.aspx#layout=layout1

If the DOM is being modified by javascript through ajax calls, and this modified data is what you are trying to capture then using a standard .NET WebClient won't work. You need to use a WebBrowser control so that it will actually execute the script, otherwise you will just be downloading the source.

If you need to just "load" it, then you'll need to understand how the page functions and try making the AJAX call yourself. Firebug and other similar tools allow you to see what requests are made by the browser.

There is no reason you cannot make the same web request from C# that the original page is making from Javascript. Depending on the architecture of the website, this could range in difficulty from constructing the proper URL with query string arguments (easy) to simulating a post with lots of page state (hard). The response content would most likely then be XML or JSON content instead of the HTML DOM, which if you're scraping for data will be a plus.

A long time ago I wrote a VB app to screen scrape financial sites and made it so that you could fire up multiple of these "harvester" screen scrapers. That might ease the time period loading data. We could do thousands of scrapes a day with multiple of these running on multiple boxes. Each harvester got its marching orders from information stored in the database, like what customer to get next and what was needed to scrape (balances, transaction history, etc.).

Like Michael said above, make a simple WinForms app with a WebBrowser control in it. You have to trap the DocumentComplete event. That should only fire when the web page is completely loaded. Then check out this post which gives an overview of how to do it.

Use the Html Agility Pack. It allows download of .html and scraping via XPath.

See How to use HTML Agility pack

C# AJAX or Java response HTML scraping

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？