Special characters from MySQL database (e.g. curly apostrophes) are mangling my XML
I have a MySQL database of newspaper articles. There's a volume table, an issue table, and an article table. I have a PHP file that generates a property list that is then pulled in and read by an iPhone app. The plist holds each article as a dictionary inside each issue, and each issue as a dictionary inside each volume. The plist doesn't开发者_开发知识库 actually hold the whole article -- just a title and URL.
Some article titles contain special characters, like curly apostrophes. Looking at the generated XML plist, whenever it hits a special character, it unpredictably gobbles up a whole bunch of text, leaving the XML mangled and unreadable.
(...in Chrome, anyway, and I'm guessing on the iPhone. Firefox actually handles it pretty well, showing a white ? in a black diamond in place of any special characters and not gobbling anything.)
Example well-formed plist snippet:
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Rows</key>
<array>
<dict>
<key>Title</key>
<string>Vol. 133 (2003-2004)</string>
<key>Children</key>
<array>
<dict>
<key>Title</key>
<string>No. 18 (Apr 2, 2004)</string>
<key>Children</key>
<array>
<dict>
<key>Title</key>
<string>Basketball concludes historic season</string>
<key>URL</key>
<string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2004-04-02&section=1&id=1</string>
</dict>
<!-- ... -->
</array>
</dict>
</array>
</dict>
</array>
</dict>
</plist>
Example of what happens when it hits a curly apostrophe: This is from Chrome. This time it ate 5,998 characters, by MS Word's count, skipping down to midway through the opening the title of a pizza story; if I reload it'll behave differently, eating some other amount. The proper title is: Singer-songwriter Farrell ’05 finds success beyond the bubble
<dict>
<key>Title</key>
<string>Singer-songwriter Farrell ing>Students embrace free pizza, College objects to solicitation</string>
<key>URL</key>
<string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2009-09-18&section=1&id=9</string>
</dict>
In MySQL that title is stored as (in binary):
53 69 6E 67 |65 72 2D 73 |6F 6E 67 77 |72 69 74 65
72 20 46 61 |72 72 65 6C |6C 20 C2 92 |30 35 20 66
69 6E 64 73 |20 73 75 63 |63 65 73 73 |20 62 65 79
6F 6E 64 20 |74 68 65 20 |62 75 62 62 |6C
Any ideas how I can encode/decode things properly? If not, any idea how I can get around the problem some other way?
I don't have a clue what I'm talking about, haha; let me know if there's any way I can help you help me. :) And many thanks!
here's a few options
- use
htmlentities()
to encode special characters when inserting in the table - change everything to UTF-8
try using CDATA around the titles ie
<string><![CDATA[ BLAH BLAH BLAH ]]></string>
精彩评论