Truncate a string with an ellipsis, making sure not to break any HTML entity
I have a database of items with XHTML content and I want to display the items with the HTML stripped off (done) and then 开发者_C百科truncate each item to a maximum length of 100 characters. If the string exceeds 100 characters, I cut it off and insert …
(an ellipsis) at the end.
The problem is that my program doesn't understand HTML entities that are already in the string. E.g. if the string is something & something
, my function may truncate it as something &am...
resulting in invalid XHTML.
What is the best way to go about this problem in ASP.NET/C#?
You could use HtmlDecode to convert html entities to normal string, then truncate this string and finally encode the result:
var decoded = HttpUtility.HtmlDecode(theEncodedString);
decoded = Truncate(decoded);
var result = HttpUtility.HtmlEncode(decoded);
You could use a regular expression to match either an HTML entity or a single character, and repeat up to the length that you want. Something like:
^(&\w+;|.){,100}
精彩评论