开发者

In Java, how I count the size of the download of a webpage during the download?

I want to do this:

I have a max limit size for donwload(eg.: 10MB)开发者_开发技巧. I start the download of a webpage. If the download of the page does not finish until the limit was reached, I stop the download.

I done a similar question here: In Java, it's possible determine the size of a web page before download?, but it was to discover the size of the page before I start the download, but some servers don't send this information. Now I need to control during the donwload.

They told me to use CountInputStream. This is the way? I using HttpUrlConnection, so the download is not done with the getInputStream?


If the web server supports it, you could look at the Content-Length header, that would tell you how big the thing would be:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

If the server/item doesn't support content length, you'd have to read the whole thing and just count bytes...

The answer you linked to seems to contain most of the rest of the information you'd need, isn't it almost exactly the same question as yours?


If you are using HttpUrlConnection to read from a remote resource over HTTP, this then implies that you are reading the data returned by the remote resource using HttpUrlConnection.getInputStream().

To count the number of bytes read as you read from the connection, simply... count the number of bytes as you read from the inputStream. For example:

HttpUrlConnection conn = ...;
byte[] dataBuffer = new byte[MAX_BYTES];
InputStream stream = conn.getInputStream();
int bytesRead, totalRead = 0;
while (bytesRead != -1) {
    bytesRead = stream.read(dataBuffer, totalRead, bufferLength);
    totalRead += bytesRead;
    if (totalRead > MAX_BYTES) throw new FileTooBigException(...);
}


You can do an HTTP HEAD request, but that's only going to return the "Content-Length" of the web page.

Size of a web page is a funny thing, as a web page contains a lot of other documents (graphics for instance). Content-Length is not quite the "entire size" of the document, and even if you ask the content length at this moment, it is no guarantee that the content length will be the same a mere millisecond later.

For static pages, content-length could probably be trusted; however, for dynamic content, I would recon that content-length is either sometimes wrong or always wrong.


If you make sure HTTP 1.1 keep-alive is enabled (Connection: keep-alive) and the server agrees, the server is obliged to send a content-length.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜