Parsing html with phpQuery : how to handle C++ code inside a pre tag?
In the database I have some code like this one
Some text
<pre>
#include <cstdio>
int x = 1;
</pre>
Some text
When I'm trying to use phpQuery to do the parsing it fails because the <cstdio>
is interpreted as a tag.
I could use htmlspecialchars
but to apply it only inside pre
tags I still need to do some parsing. I could use regex but it will be much more difficult (I will need 开发者_开发技巧to handle the possible attributes of the pre
tag) and the idea of using a parser was to avoid this kind of regex thing.
What's the best way to do what I need to do ?
Remember to do encode HTML (& > and so on) before assembly
I finally went the regex way, considering only simple attributes for the pre
tag (no '>' inside the attributes) :
foreach(array('pre', 'code') as $sTag)
$s = preg_replace_callback("#\<($sTag)([^\>]*?)\>(.+?)\<\/$sTag\>#si",
function($matches)
{
$matches[3] = str_replace(array('&', '<', '>'), array('&', '<', '>'), $matches[3]);
return "<{$matches[1]} {$matches[2]}>".htmlentities($matches[3], ENT_COMPAT, "UTF-8")."</{$matches[1]}>";
},
$s);
It also deals with caracters being already converted to html entities (we don't want to have it twice).
Not a perfect solution but given the data I need to apply it on it will do the work.
The error is, that your database contains HTML
that contains some text which is not correctly encoded already.
So, if you want to save time and have a correct solution, then you should make sure, that the HTML in your database is correctly encoded. This means, you should make sure that everything will be correctely encoded (using htmlspecialchars()
) before it is saved to your database!
Otherwise you just save garbage in your database, and you will have to write some special code to "prettify that garbage".
Any other solutions are workarounds, and those will cost you precious time in your future.
So: the best solution is to make sure, that anything you write to your database is correct.
精彩评论