how to truncate HTML string without leaving it malformated?
I have to display first N (for example say 50 or 100) characters out of entire html string. I have to display well formated html.If i apply simple substring that will get me a malformated html string E.g.
Sample string : "<html><body><a href="http://foo.com">foo</a></body></html>"
trucated string: "<html><body><a href="http://foo.com">fo开发者_StackOverflowo<"
This will get me malformated html :(
Any ideas on how to achieve this ??
You can try using the HTML Agility Pack - it will parse out the HTML for you, but you will need to figure out how to produce a truncated version yourself. It should make things a lot easier though.
Parse the HTML into a DOM tree. Start with the deepest/innermost elements and
- remove the content of the innermost node, or the node if it has no content
- check the string length.
Rinse, lather, repeat.
This may truncate your string to the empty string, if your desired length is small enough.
For extra kicks, you could try removing attributes of the nodes as you go.
I've seen some forum systems simply append a </b></u></i></s> after every single post. You could approach this in a similar fashion.
Of course, its ugly and it wouldn't fix that trailing <
That is by far the simplest method. Better method would actually be generating a tree and... kicking nodes off until you meet the requirement.
精彩评论