Issue with encoding UTF-8 when FTPing files
I am able to have my application upload files via FTP using the FTPClient Java library.
(I happen to be uploading to an Oracle XML DB repository.)
Everything uploads fine unless the xml file has curly quotes in it. In which cas开发者_高级运维e I get the error: LPX-00200: could not convert from encoding UTF-8 to UCS2
I can upload what I believe to be the same file using the Windows CMD line FTP tool. I am wondering if there is some encoding setting that the windows CMD line tool uses that maybe I need to set in my Java code.
Anyone know stuff about this? Thanks!!
I don't know that application but you could try to use -Dfile.encoding=UTF-8 on your JVM command line
Not familiar with Oracle XML DB repositories—can they accept compressed uploads? Zipping or gzipping your file would save resources and frustrate any ASCII file type autodetection in use.
In binary this problem goes away.
FTPClient.setType(FTPClient.TYPE_BINARY);
http://www.sauronsoftware.it/projects/ftp4j/manual.php#3
If your file contains curly quotes, they are in the high-order bit set range in iso-8859-1 and windows-1252 character sets. In UTF-8, those characters usually take two bytes in UTF-8.
It's quite possible that you've accidentally encoded the xml file in one of these encodings instead of UTF-8. That would result in a conversion error, because the high-order bit being set is only allowed in sequences of multiple UTF-8 octets.
If you're in Windows, open the file in Notepad and try re-saving the document using Save As... with the UTF-8 encoding, and upload the changed file.. In Unix, use iconv or a similar tool to convert from iso-8859-1 to UTF-8 before uploading.
If the XML document explicitly marks its encoding, make sure it's marked with the correct encoding (e.g. UTF-8). In many xml parsers, you can parse iso-8859-1 or windows-1252 character set encoded XML as long as it's marked as such.
精彩评论