开发者

PHP strip non-SGML characters from a string?

I've got nonstandard characters coming out of my database (due to line breaks).

My HTML validator is complaining about them.

Since my HTML validator is a direct extension of my ego, I'd like to keep the thing happy and green-ok-arrow-y.

Does someone who's done this before have a quick fix?

BTW I don't want to change the page's charset, doctype, or the data. Just looking for a utf8_decode() type thing that would clean up the string, but utf8_encode() and utf8_decode() don't work...

UPDATE

Sorry, "non-standard characters" is a bit vague, but then so is this error warning. Specifically, they're not SGML characters, which apparen开发者_如何学JAVAtly don't fit the SGML parser...but now I get into the fuzzy territory, not sure what's going on.


If by non-standard characters you mean the XHTML validator sees characters in your document that are not permitted by the XML specification, which is here: http://www.w3.org/TR/xml/#charsets then your solution is to use XML entities to escape them. For example if you have the illegal character U+0004, then you can turn that into  in PHP before writing it out.

If by non-standard characters you mean your byte sequence is so whacked that it is not a legal byte sequence of UTF-8 (i.e., it cannot be decoded), then you have a logic error in your application. Perhaps you are reading bytes instead of asking PHP to read characters and encode them properly.

EDIT: In response to the comment above about the illegal character being number 30, well that is indeed an illegal character in XML and thus XHTML. If you intend them to be line breaks, then do a php regex substitution to replace \x1E with \n.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜