Stop greater than sign converting to HTML entity
I'm writing some XML in PHP that is not validating because the closing greater than sign on a CDATA element is getting converted to an HTML entity. The code is as follows:
$xml .= '<item number="'.$i.'">
<sku>'.$this->get_product_sku($key, $value).'</sku>
<description>
<![CDATA[
'.get_the_title($value['prodid']).'
]]>
</description>
<qty>'.$value['quantity'].'</qty>
<price>'.$value['price'].'</price>
<extended>'.$value['quantity']*$value['price'].'</extended>
</item>';
The resulting XML looks something like the following when printed out using var_dump
or print_r
:
<item number="2">
<sku>45NK2</sku>
开发者_高级运维 <description>
<![CDATA[
Test Product
]]>
</description>
<qty>2</qty>
<price>1500.00</price>
<extended>3000.00</extended>
</item>
The closing >
turns into >
and the XML does not validate. Can someone help me fix this problem?
Thanks!
EDIT: Here is the whole function that generates the XML. I only call and print this function. There is nothing done to the string that is invalidating it.
function build_xml($p, $c)
{
global $wpdb;
// Make the billing and shipping data available
$this->determine_shipping_details($p, $c);
$this->determine_billing_details($p, $c);
// Build the XML
$xml = '<?xml version="1.0" ?>
<orderdata batch="'.$p['id'].'">
<order id="'.$p['id'].'">
<orderdate>'.date('m/d/Y h:i:s', $p['date']).'</orderdate>
<store>'.$this->store_id.'</store>
<adcode>OL</adcode>
<username>'.$this->username.'</username>
<password>'.$this->password.'</password>
<billingaddress>
<firstname>'.$this->billing_details['first_name'].'</firstname>
<lastname>'.$this->billing_details['last_name'].'</lastname>
<address1>'.$this->billing_details['address'].'</address1>
<city>'.$this->billing_details['city'].'</city>
<state>'.$this->billing_details['state'].'</state>
<zipcode>'.$this->billing_details['zip'].'</zipcode>
<country>'.$this->billing_details['country'].'</country>
<phone>'.$this->billing_details['phone'].'</phone>
<email>'.$this->billing_details['email'].'</email>
</billingaddress>
<shippingaddress>
<firstname>'.$this->shipping_details['first_name'].'</firstname>
<lastname>'.$this->shipping_details['last_name'].'</lastname>
<address1>'.$this->shipping_details['address'].'</address1>
<city>'.$this->shipping_details['city'].'</city>
<state>'.$this->shipping_details['state'].'</state>
<zipcode>'.$this->shipping_details['zip'].'</zipcode>
<country>'.$this->shipping_details['country'].'</country>
<phone>'.$this->shipping_details['phone'].'</phone>
<email>'.$this->shipping_details['email'].'</email>
</shippingaddress>
<orderdetails>';
// Add the individual items' information to the XML
$i = 1;
foreach($c as $key => $value)
{
$xml .= '<item number="'.$i.'">
<sku>'.$this->get_product_sku($key, $value).'</sku>
<description>
<![CDATA[
'.get_the_title($value['prodid']).'
]]>
</description>
<qty>'.$value['quantity'].'</qty>
<price>'.$value['price'].'</price>
<extended>'.str_replace(stripslashes( get_option('wpsc_thousands_separator') ), '', trim(wpsc_currency_display($value['quantity']*$value['price'], array('display_currency_symbol' => false, 'display_decimal_point' => true, 'display_currency_code' => false, 'display_as_html' => false)))).'</extended>
</item>';
$i++;
}
// Add the order totals
$xml .= '<subtotal>'.str_replace(stripslashes( get_option('wpsc_thousands_separator') ), '', trim(wpsc_currency_display($p['totalprice']-$p['wpec_taxes_total']-$p['base_shipping'], array('display_currency_symbol' => false, 'display_decimal_point' => true, 'display_currency_code' => false, 'display_as_html' => false)))).'</subtotal>
<shipping code="'.'FEG'.'" rate="'.$p['base_shipping'].'" thirdparty="">'.'FEDEX GROUND SERVICE'.'</shipping>
<tax rate="'.$p['wpec_taxes_rate'].'">'.$p['wpec_taxes_total'].'</tax>
<total>'.$p['totalprice'].'</total>
<amountpaid>'.$p['totalprice'].'</amountpaid>
</orderdetails>';
// Close out the tags
$xml .= '</order>
</orderdata>';
return $xml;
}
When i run it on my webserver it is formatted correctly. Are you setting the header?
Try
header('Content-type: text/xml');
echo $xml;
From the information you provided with your question, it's hard to specifically say why the output gets mangled.
So you need to step through your program and look into each point where your XML is build (already part of your question) and processed further on by your wordpress setup with it's various plugins and themes.
For that it's necessary to get an understanding where such modifications can appear.
Additionally you need a method to see the output as-is, that means unchanged. If you look into source-code in your browser, this often is not the case: Browsers change the output before they display it, so it's often better to dump request responses in the command-line with a HTTP client like curl
which you can use to optionally dump the output into a file and look at with an editor unchanged.
Let's recap:
- The creation of the XML must be correct firsthand.
- The XML might get changed by wordpress.
- The XML might get changed by the browser.
This can be a lot of points to check:
1. The creation of the XML must be correct firsthand.
First of all I would look into the return value of get_the_title($value['prodid'])
alone, so you actually know what you deal with. Probably it already contains the >
? This would explain where that single >
might come from. It would be valid to use it within <![CDATA[...]]>
however. That's just for smelling and understanding what might happen later on.
Next to the single value in question, you should ensure the XML itself looks correct before processing it furthe, which means at the end of the function. You can do so by outputting it before returning from the method and ending/exiting the application to prevent further processing:
echo "Test output:\n\n", $xml; die();
Then look into the output. Does it looks correct? Is the problematic >
already in there at the end of the cdata section in question? If yes, you know that the problem is already inside the function. If not, you know that the problem is unrelated to the function in question and that the XML is mangled later on. Depending on the outcome, you need to look for the defect.
2. The XML might get changed by wordpress.
In comments you asked:
Why would var_dump be filtered? I'm running this in a plugin I'm building. Not sure why this would be filtered.
Next to filtering done by the program (browser, source-viewer etc.), wordpress itself or one of it's addons (plugins, themes) might filter the output. From your comment you already say that you don't know why this can happen, therefore you don't know where this can happen as well.
You have not shared yet how the xml is output. Are you just echo'ing it to the browser? Is it passed to some function that handles the output? This is most likely very important to find the cause of your issue. For example is your plugin answering to an XMLRPC request? In your question you're focussing a lot on the invalid XML, but you didn't share much information for which purpose the XML is being created, where it goes to and for what reason etc.. This information would be useful to understand the bigger picture.
If you take care of the output yourself (echo
, print
etc.), some code might have installed an output buffer. That means your output get's buffered and probably processed later on. These output buffer related issues are harder to track down. First of all you can disable all other plugins and themes and see what happens. Wordpress itself is not making much use of output buffering (Output Buffering Control [Docs]) so this could nail it quite fast because then only the default output buffering would interfere with your output.
If you make use of a wordpress function to output the XML, then filters can be in action. Wordpress has a filter-system build in which allows itself to hook and change various data. Additionally, Wordpress core functions itself are always "trying hard" to escape output. So actually there can be a lot of points where this filtering is actually taking place. "Not sure why this would be filtered." - There might be no why for your case, it's just that it always happens.
These issues can be located more easily by using an interactive debugger with breakpoints and variable inspection. It allows you to look into the program while it executes and you can see "live" what happens with the data. However you don't have it always. The other alternative is to set breakpoints yourself (die
) and do the output yourself (echo
, var_dump
etc.).
3. The XML might get changed by the browser.
I've already wrote about it at the beginning and in between. Basically if you're not seeing the source as-is, but mangled by the browser, you just might suspect the cause of error wrongly. It's like using the wrong glasses and just hinders you to track things down in the first place. So know your tools.
Things are not always easy to detect. You need to look into the right area in the first place and you need to consistently track things down. There can be various reasons why things happen if the software is more complex like Wordpress.
Try using html_entity_decode()
or htmlspecialchars_decode()
. Either should work for this case.
http://www.php.net/manual/en/function.html-entity-decode.php http://www.php.net/manual/en/function.htmlspecialchars-decode.php
Encode it on purpose:
$xml .= '<item number="'.$i.'">
<sku>'.$this->get_product_sku($key, $value).'</sku>
<description>
<![CDATA[
'.get_the_title($value['prodid']).'
]]>
</description>
<qty>'.$value['quantity'].'</qty>
<price>'.$value['price'].'</price>
<extended>'.$value['quantity']*$value['price'].'</extended>
</item>';
Then decode it on display:
echo html_entity_decode($xml);
I know, this is an old thread which I am reviving, but still thought of sharing this so that others looking for a solution to similar problem might get benefited. Specially when this whole discussion doesn't have the right answer.
Solution is very simple. The problem is that wordpress processes this as an HTML rather than script and converts greater than symbol > to >. The offending code is in /wp-includes/post-template.php and looks like below:
function the_content($more_link_text = null, $stripteaser = false) {
$content = get_the_content($more_link_text, $stripteaser);
$content = apply_filters('the_content', $content);
/** $content = str_replace(']]>', ']]>', $content); */
As you may notice the last line is converting ]]> to ]]>. Commenting out this will solve the problem.
精彩评论