开发者

Parsing PDF files hosted in web servers

I have used iText to parse pdf files. It works开发者_Go百科 well on local files but I want to parse pdf files which are hosted in web servers like this one:

"http://protege.stanford.edu/publications/ontology_development/ontology101.pdf"

but I don't know how??? Could you please answer me how to do this task using iText or other libraries... thx


You need to download the bytes of the PDF file. You can do this with:

URL url = new URL("http://.....");
URLConnection conn = url.getConnection();

if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) { ..error.. }
if ( ! conn.getContentType().equals("application/pdf")) { ..error.. }

InputStream byteStream = conn.getInputStream();
try {
  ... // give bytes from byteStream to iText
} finally { byteStream.close(); }


Use the URLConnection class:

URL reqURL = new URL("http://www.mysite.edu/mydoc.pdf" );
URLConnection urlCon = reqURL.openConnection();

Then you can use the URLConnection method to retrieve the content. Easiest way:

InputStream is = urlCon.getInputStream();
byte[] b = new byte[1024]; //size of a buffer, can be any
int len;
while((len = is.read(b)) != -1){
    //Store the content in preferred way
}
is.close();


Nothing to it. You can pass a URL directly into PdfReader, and let it handle the streaming for you:

URL url = new URL("http://protege.stanford.edu/publications/ontology_development/ontology101.pdf" );
PdfReader reader = new PDFReader( url );

The JavaDoc is your friend.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜