pdf parse to text in java
I have an Arabic PDF, and I want to parse it into text document using Java. I have tried many times, and the English words parse successfully but the Arabic words don't.
Can anyone recommend a sol开发者_运维问答ution that will convert the Arabic words properly as well?
There are several libraries that come to mind. Apache Tika, iText or pdfbox will all more or less solve your problem. Although, I must put in a word for Tika, as it supports language detection, and can also handle other document types too.
I think you can use iText for pdf manipulation using Java. It supports Arabic too.
精彩评论