HXT: Surprising behavior when reading and writing HTML to String in pure code
I want to read HTML from a String, process it and return the changed document as a String using HXT. As this operation does not require IO, I would rather execute the Arrow with runLA
than with runX
.
The code look like this (omitting the processing for simplicity):
runLA (hread >>> writeDocumentToString [withOutputHTML, withIndent yes]) html
However, the surrounding html
tag is missing in the result:
["\n <head>\n <title>Bogus</title>\n </head>\n <body>\n Some trivial bogus text.\n </body>\n",""]
When I use runX instead like this:
runX (readString [] html >>> writeDocu开发者_如何学编程mentToString [withOutputHTML, withIndent yes])
I get the expected result:
["<html>\n <head>\n <title>Bogus</title>\n </head>\n <body>\n Some trivial bogus text.\n </body>\n</html>\n"]
Why is that, and how can I fix it?
If you look at the XmlTree
s for both, you'll see that readString
adds a top-level "/"
element. For the non-IO
runLA
version:
> putStr . formatTree show . head $ runLA xread html
---XTag "html" []
|
+---XText "\n "
|
+---XTag "head" []
...
And with runX
:
> putStr . formatTree show . head =<< runX (readString [] html)
---XTag "/" [NTree (XAttr "transfer-Status") [NTree (XText "200")...
|
+---XTag "html" []
|
+---XText "\n "
|
+---XTag "head" []
...
writeDocumentToString
uses getChildren
to strip off this root element.
One easy way around this is to use something like selem
to wrap the output of xread
in a similar root element, in order to make it look like the kind of input writeDocumentToString
expects:
> runLA (selem "/" [xread] >>> writeDocumentToString [withOutputHTML, withIndent yes]) html
["<html>\n <head>\n <title>Bogus</title>\n </head>\n <body>\n Some trivial bogus text.\n </body>\n</html>\n"]
This produces the desired output.
精彩评论