parse html in adobe air
I am trying to load and parse html in adobe air. The main purpose being to extract title, meta tags and links. I have been trying the HTMLLoader but I get all sort of errors, mainly javascript uncaught exceptions.
I also tried to load the html content directly (using URLLoader) and push the text into HTMLLoader (using loadString(...)) but got the same error. Last resort wa开发者_Go百科s to try and load the text into xml and then use E4X queries or xpath, no luck there cause the html is not well formed.
My questions are:
- Is there simple and reliable (air/action script) DOM component there (I do not need to display the page and headless mode will do)?
- Is there any library to convert (crappy) html into well formed xml so I can use xpath/E4X
- Any other suggestions on how to do this?
thx
ActionScript is supposed to be a superset of JavaScript, and thankfully, there's...
Pure JavaScript/ActionScript HTML Parser
created by Javascript guru and jQuery creator John Resig :-)
One approach is to run the HTML through HTMLtoXML() then use E4X as you please :)
Afaik:
- No :-(
- No :-(
- I think the easiest way to grab title and meta tags is writing some regular expressions. You can load the page's HTML code into a string and then read out whatever you need like this:
var str:String = ""; // put HTML code in here
var pattern:RegExp = /<title>(.+)<\/title>/i;
trace(pattern.exec(str));
精彩评论