开发者

How can I store UTF8 in MySQL with PHP, sanitize it, echo it with XML and transform it with XSLT?

I am developing a MVC application with PHP that uses XML and XSLT to print the views. It need to be fully UTF-8 supported. I also use MySQL right configured with UTF8. My problem is the next.

I have a <input type="text"/> with a value like àáèéìíòóùú"><'@#~!¡¿?. This is processed to add it to the database. I use mysql_real_escape_string($_POST["name"]) and then do MySQL a INSERT. This will add a slash \ before " and '.

The MySQL database have a DEFAULT CHARACTER SET utf8 and COLLOCATE utf8_spanish_ci. The table field is a normal VARCHAR.

Then I have to print this on a XML that will be transformed with XSLT. I can use PHP on the XML so I echo it with <?php echo TexUtils::obtainSqlText($value_obtained_from_sql); ?>. The obtainSqlText() function actually returns the same as the $value processed, is waiting for a final structure.

One of the first things that I will need for the selected input is to convert > and < to &gt; and &lt; because this will generate problems with start/end tags. This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>. This will also converts & to &amp;, " to &quot; and ' to &#039;. This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special character开发者_如何学Gos.

There is another problem. I've talked about àáèéìíòóùú"><'@#~!¡¿? input but I will have some text from a CKEditor <textarea /> that the value will look like:

<p>
    <a href="http://stackoverflow.com/">àáèéìíòóùú"><'@#~!¡¿?</a>
</p>

How I've to manage this? At first, if I want to print this second value right I will need to use <xsl:value-of select="value" disable-output-escaping="yes" />. Will "><' print right?

So what I am really looking for is how I need to manage this values and how I've to print. I need to use something if is coming from a VARCHARthat doesn't allows HTML and another if is a TEXT (for example) and allows HTML? I will need to use disable-output-escaping="yes" everytime?

I also want to know if doing this I am really securing the query from XSS attacks.

Thank you in advance!


This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>.

Fine.

This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special characters.

It shouldn't fail on htmlspecialchars() output, ever. &amp; is a predefined entity in XML and &#39; is a character reference which is always allowed. htmlspecialchars() should produce XML-compatible output, unlike the usually-a-mistake htmlentities(). What is the error you are seeing?

<a href="http://stackoverflow.com/">àáèéìíòóùú"><'@#~!¡¿?</a>

Urgh, an HTML rich text editor produced that invalid markup? What a dodgy editor.

If you have to allow users to input arbitrary HTML, it's going to need some processing. Unless you really trust those users, you'll need a purifier (to stop them using dangerous scripting elements and XSS-ing each other), and a tidier (to remove malformed markup either due to crap rich-text-editor output or deliberate sabotage). If you intend to put the content directly into XML you will also need it to convert to XHTML output and replace HTML entity references.

A simple way to do this in PHP would be DOMDocument->loadHTML followed by a walk of the DOM tree removing all but known-good elements/attributes/URL-schemes, followed by DOMDocument->saveXML.

Will "><' print right?

Well, it'll print as in your example, yes. But that's equally invalid as both HTML and XML.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜