开发者

Convert HTML to XML with WP7

simple situation, want to search through a HTML string, get out a couple of information. Gets annoying after writing mass lines of .Substing and. IndexOf for each element i want to find and cut out of the HTML file.

Afaik i´m unable to load such dll as HTMLtidy or HTML Agility Pack into my WP7 project so is there a more efficient and reliable way to search trough my HTML string instead of building Substings with Index开发者_StackOverflow社区Of?

    void client_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
    {
       string document = string.Empty;
       using (var reader = new StreamReader(e.Result))
          document = reader.ReadToEnd();

       string temp = document.Substring(document.IndexOf("Games Played"), (document.IndexOf("League Games") - document.IndexOf("Games Played")));
       temp = (temp.Substring(temp.IndexOf("<span>"), (temp.IndexOf("</span>") - temp.IndexOf("<span>")))).Remove(0, 6);
       Int32.TryParse(temp, out leaugeGamesPlayed);
    }

Thanks for your help

Gpx


You can use the HTML Agility Pack but you need the converted version of HTML Agility Pack for the Phone. It's only available from svn repository but it works great, I use it in my app.

http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/77494#

You can find two projects under trunk named HAPPhone and HAPPhoneTest. You can use the download button to the right to get the code. It uses Linq instead of XPath to work.


You could use LINQ to parse the HTML and locate the elements that you're interested in. For example:

XDocument parsed = XDocument.Parse(document);
var spans = parsed.Descendants("span");

Beth Massi has a great blog post: Querying HTML with LINQ to XML


Assuming you're doing this because you're getting the HTML from a web site/page/server.

Don't convert it on the device.

Create a wrapper/proxy site/server/page to do the conversion for you. While this has the downside of having to create the extra service, it has the following advantages:

  • Code on the server will be easier to update than code within a distrbued app. (Experience with parsing HTML you don't directly control will show that you will need to make changes in your parsing as the original HTML is almost certain to throw something unexpected at you when changed in the future.)
  • If you can do it once on the server you can cache the result rather than having instance of the app have to do the conversion over.
  • By virtue of the above 2 points, the app will run faster!

If you have the HTML file at design/build time then convert it to something easier to work with and avoid unnecessary computation at run time.


As a workaround, you could consider loading the HTML into a WebBrowser control and then query the DOM via injected javascript (which calls back to .NET)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜