How can I store UTF8 in MySQL with PHP, sanitize it, echo it with XML and transform it with XSLT?
I am developing a MVC application with PHP that uses XML and XSLT to print the views. It need to be fully UTF-8 supported. I also use MySQL right configured with UTF8. My problem is the next.
I have a <input type="text"/>
with a value like àáèéìíòóùú"><'@#~!¡¿?. This is processed to add it to the database. I use mysql_real_escape_string($_POST["name"])
and then do MySQL a INSERT
. This will add a slash \ before " and '.
The MySQL database have a DEFAULT CHARACTER SET utf8
and COLLOCATE utf8_spanish_ci
. The table field is a normal VARCHAR
.
Then I have to print this on a XML that will be transformed with XSLT. I can use PHP on the XML so I echo it with <?php echo TexUtils::obtainSqlText($value_obtained_from_sql); ?>
. The obtainSqlText() function actually returns the same as the $value processed, is waiting for a final structure.
One of the first things that I will need for the selected input is to convert > and < to >
and <
because this will generate problems with start/end tags. This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>
. This will also converts & to &
, " to "
and ' to '
. This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special character开发者_如何学Gos.
There is another problem. I've talked about àáèéìíòóùú"><'@#~!¡¿? input but I will have some text from a CKEditor <textarea />
that the value will look like:
<p>
<a href="http://stackoverflow.com/">àáèéìíòóùú"><'@#~!¡¿?</a>
</p>
How I've to manage this? At first, if I want to print this second value right I will need to use <xsl:value-of select="value" disable-output-escaping="yes" />
. Will "><' print right?
So what I am really looking for is how I need to manage this values and how I've to print. I need to use something if is coming from a VARCHAR
that doesn't allows HTML and another if is a TEXT
(for example) and allows HTML? I will need to use disable-output-escaping="yes" everytime?
I also want to know if doing this I am really securing the query from XSS attacks.
Thank you in advance!
This will be done with
<?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>
.
Fine.
This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special characters.
It shouldn't fail on htmlspecialchars()
output, ever. &
is a predefined entity in XML and '
is a character reference which is always allowed. htmlspecialchars()
should produce XML-compatible output, unlike the usually-a-mistake htmlentities()
. What is the error you are seeing?
<a href="http://stackoverflow.com/">àáèéìíòóùú"><'@#~!¡¿?</a>
Urgh, an HTML rich text editor produced that invalid markup? What a dodgy editor.
If you have to allow users to input arbitrary HTML, it's going to need some processing. Unless you really trust those users, you'll need a purifier (to stop them using dangerous scripting elements and XSS-ing each other), and a tidier (to remove malformed markup either due to crap rich-text-editor output or deliberate sabotage). If you intend to put the content directly into XML you will also need it to convert to XHTML output and replace HTML entity references.
A simple way to do this in PHP would be DOMDocument->loadHTML
followed by a walk of the DOM tree removing all but known-good elements/attributes/URL-schemes, followed by DOMDocument->saveXML
.
Will
"><'
print right?
Well, it'll print as in your example, yes. But that's equally invalid as both HTML and XML.
精彩评论