开发者

WebKit & Objective-C: how to parse a HTML string into a DOMDocument?

How do you get a DOMDocument from a given HTML string using WebKit? In other words, what's the implementation for DOMDocumentFromHTML: for something like the following:

NSString * htmlString = @"<html><body><p>Test</body></html>";
DOMDocument * document = [self DOMDocumentFromHTML: htmlString];

DOMNode * bodyNode = [[document getElementsByTagName: @"body"] item: 0];
// ... etc.

This seems like it should be straightforward to do, yet I'm still having trouble figuring out how :(开发者_如何转开发 ...


Not an actual answer to the question, but I've now concluded that WebKit and DOMDocument are likely not the most appropriate tools for what I want to do; which is to process an HTML document that is not shown to the user. The class NSXMLDocument straightforwardly and synchronously supports turning an HTML document into a manipulable object structure:

NSError * error = nil;
NSString * htmlString = @"<html><body><p>Test</body></html>";

NSXMLDocument * doc =
  [[NSXMLDocument alloc]
     initWithXMLString: htmlString
     options: NSXMLDocumentTidyHTML
     error: &error];
NSLog(@"Error is: %@", error);
NSLog(@"Doc is: %@", doc);
NSLog(@"Root element is: %@", [doc rootElement]);
NSLog(@"Root element's children are: %@", [[doc rootElement] children]);


According to what I can derive from another answer on this site, there is no synchronous method such as my requested DOMDocumentFromHTML: available in WebKit.

So far, the best I've been able to do is the following asynchronous combination of giveDOMDocumentFromHTML:usingBaseURL: and takeDOMDocument:.

- (void) giveDOMDocumentFromHTML: (NSString *) htmlString
         usingBaseURL: (NSURL *) baseURL
{
    WebView * webView = [[WebView alloc] init];
    [webView setFrameLoadDelegate: self];
    [[webView mainFrame] loadHTMLString: htmlString
                         baseURL: baseURL];
}

- (void) takeDOMDocument: (DOMDocument *) document
{
    DOMHTMLElement * bodyNode =
        (DOMHTMLElement *) [[document getElementsByTagName: @"body"] item: 0];
    NSLog(@"Body is: %@", [bodyNode innerHTML]);
}

They are hooked together through the following delegate method:

- (void) webView: (WebView *) webView
         didFinishLoadForFrame: (WebFrame *) frame
{
    if (frame == [webView mainFrame]) {
        [self takeDOMDocument: [frame DOMDocument]];
    }
}

The above works, but has at least the following remaining issues:

  • I'm not sure where the allocated WebView should be sent a release or autorelease message.
  • I would prefer/need the application to remain blocked until the HTML page has been processed. In the above scheme the application will be processing any user input while the WebView is loading/parsing the HTML. (Note that the WebView will never be shown on screen.)

So this is still very much up for improvement. Anyone who can provide a synchronous implementation for DOMDocumentFromHTML: as outlined in the original question?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜