How to handle non-ASCII Characters in Java while using PDPageContentStream/PDDocument
I am using PDFBox to c开发者_C百科reate PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.
Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.
How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!
Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):
Add the following line to top of all JSPs:
<%@ page pageEncoding="UTF-8" %>
This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.
Create a
Filter
which does the following indoFilter()
method:request.setCharacterEncoding("UTF-8");
Map this on the
FacesServlet
like follows:<filter-mapping> <filter-name>nameOfYourCharacterEncodingFilter</filter-name> <servlet-name>nameOfYourFacesServlet</servlet-name> </filter-mapping>
This sets the request encoding of all JSF POST requests to UTF-8.
This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.
See also:
- Unicode - How to get the characters right?
精彩评论