开发者

How to extract data from a web site and format to raw text - iPhone Dev

I have been looking around for a while and not found anything useful, also not sure if I have worded the question in the clearest fashion so apologies

I have a section of an app I am building called 'Company News'. The company in question has a news page on their website which displays a title, an excerpt of text and a read more option.

At the minute in the iPhone application I just have 开发者_JAVA百科a UIWebView which links to that URL, displays an error if no connection is available. However, if my user clicks a story to read the news obviously it opens up a new page, I want to avoid having to build in 'back' and 'forward' buttons and stay away from it looking like a browser within the app.

With that said, I am looking for a way to just extract that data from the website and just display it in my app as raw text. I am not particularly bothered about rich text formatting or anything fancy. I would just like the title and body of text.

Is this possible?


In essence, then, you are looking for an HTML parser.

Assuming the HTML you wish to parse has a predictable format, the approach I would take is to load the HTML via whatever URL loading system you want - e.g. NSURLConnection, ASIHTTPRequest, etc.

Then you will need to parse the raw HTML. I use XPath. It requires that you learn the syntax, but it should work.

For more details about how you might use XPath for parsing HTML, see the second response to this question. You will need to link to libxml2 in your project then use XPath to extract the nodes of interest.

Scraping web pages in this way is fragile, though, because it depends on the structure of a page you don't control and which could be changed unpredictably.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜