开发者

How to fetch particular HTML contents from remote URL?

I want to fetch particular HTML contents from remote websites url.

The website URL is as follow,

http://www.realtor.com/realestateandhomes-detail/10216-Montwood开发者_运维百科-Drive_El-Paso_TX_79925_M78337-06548

I want to fetch some specific information from above website url. Here I attached image it highlight the specific area I want to all highlighted portion from there is a title,image, and descriptions.

How to fetch particular HTML contents from remote URL?

How can I fetch the contents using JQuery or Javascript or Json call? Is any other way to get these?


You might be interested in checking out pjscrape (disclaimer: this is my project). It's a command-line tool using PhantomJS to allow scraping using JavaScript and jQuery in a full browser context.

  • Scrapers can be written in straight Javascript, executed in the context of the site you're scraping, with a very simple, jQuery-friendly syntax.
  • It can scrape a single page, an array of pages, or you can define a function to look for more URLs to spider on each page.
  • It supports JSON and CSV output, either to file or to STDOUT

If the site is static and the structure is uniform, it should be very fast to scrape all the content you need into a structured data format.


This will help you out:

http://papermashup.com/use-jquery-and-php-to-scrape-page-content/


When scraping content, it is vital to consider the following:
Is the content static html or will part of it's content be rendered by ajax-calls?

In the first case, simple http-get-routines like the one used in JNDPNT's comment's Link will be sufficient.
In the second case, you may want to look at automating Selenium via it's Webdriver.

In any case it might be better to ask your colleague if he can provide you with an interface to the raw data, e.g. over a webservice.


If I'm getting you right, you want The user's Browser to scrape The content of another Domain on The Fly, right?

That will Not Be Possible without proxying The Request through some Script on The Same Domain (or via a jsonp Request to a Service that returns The HTML to you) due to The Same Origin Policy.

Sorry to disappoint.


Use the Yahoo Pipes (http://pipes.yahoo.com/pipes/ )service. This can be used to grab and manipulate the page HTML, extracting the bits you want. Data can then be posted server side using the Web Service module or sent directly to the clients browser using an ordinary javascript callback.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜