开发者

pdf parse to text in java

I have an Arabic PDF, and I want to parse it into text document using Java. I have tried many times, and the English words parse successfully but the Arabic words don't.

Can anyone recommend a sol开发者_运维问答ution that will convert the Arabic words properly as well?


There are several libraries that come to mind. Apache Tika, iText or pdfbox will all more or less solve your problem. Although, I must put in a word for Tika, as it supports language detection, and can also handle other document types too.


I think you can use iText for pdf manipulation using Java. It supports Arabic too.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜