Parsing PDF files hosted in web servers
I have used iText to parse pdf files. It works开发者_Go百科 well on local files but I want to parse pdf files which are hosted in web servers like this one:
"http://protege.stanford.edu/publications/ontology_development/ontology101.pdf"
but I don't know how??? Could you please answer me how to do this task using iText or other libraries... thx
You need to download the bytes of the PDF file. You can do this with:
URL url = new URL("http://.....");
URLConnection conn = url.getConnection();
if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) { ..error.. }
if ( ! conn.getContentType().equals("application/pdf")) { ..error.. }
InputStream byteStream = conn.getInputStream();
try {
... // give bytes from byteStream to iText
} finally { byteStream.close(); }
Use the URLConnection class:
URL reqURL = new URL("http://www.mysite.edu/mydoc.pdf" );
URLConnection urlCon = reqURL.openConnection();
Then you can use the URLConnection
method to retrieve the content. Easiest way:
InputStream is = urlCon.getInputStream();
byte[] b = new byte[1024]; //size of a buffer, can be any
int len;
while((len = is.read(b)) != -1){
//Store the content in preferred way
}
is.close();
Nothing to it. You can pass a URL directly into PdfReader, and let it handle the streaming for you:
URL url = new URL("http://protege.stanford.edu/publications/ontology_development/ontology101.pdf" );
PdfReader reader = new PDFReader( url );
The JavaDoc is your friend.
精彩评论