开发者

Convert InputStream to String with encoding given in stream data

My input is a InputStream which contains an XML document. Encoding used in XML is unknown and it is defined in the first li开发者_C百科ne of XML document. From this InputStream, I want to have all document in a String.

To do this, I use a BufferedInputStream to mark the beginning of the file and start reading first line. I read this first line to get encoding and then I use an InputStreamReader to generate a String with the correct encoding.

It seems that it is not the best way to achieve this goal because it produces an OutOfMemory error.

Any idea, how to do it?

public static String streamToString(final InputStream is) {
    String result = null;

    if (is != null) {
        BufferedInputStream bis = new BufferedInputStream(is);
        bis.mark(Integer.MAX_VALUE);
        final StringBuilder stringBuilder = new StringBuilder();
        try {
            // stream reader that handle encoding
            final InputStreamReader readerForEncoding = new InputStreamReader(bis, "UTF-8");
            final BufferedReader bufferedReaderForEncoding = new BufferedReader(readerForEncoding);

            String encoding = extractEncodingFromStream(bufferedReaderForEncoding);
            if (encoding == null) {
                encoding = DEFAULT_ENCODING;
            }

            // stream reader that handle encoding
            bis.reset();
            final InputStreamReader readerForContent = new InputStreamReader(bis, encoding);
            final BufferedReader bufferedReaderForContent = new BufferedReader(readerForContent);

            String line = bufferedReaderForContent.readLine();
            while (line != null) {
                stringBuilder.append(line); 
                line  = bufferedReaderForContent.readLine();
            } 
            bufferedReaderForContent.close();
            bufferedReaderForEncoding.close();
        } catch (IOException e) { 
            // reset string builder
            stringBuilder.delete(0, stringBuilder.length());
        }  
        result = stringBuilder.toString();
    }else {
        result = null;
    }
    return result;
}


The call to mark(Integer.MAX_VALUE) is causing the OutOfMemoryError, since it's trying to allocate 2GB of memory.

You can solve this by using an iterative approach. Set the mark readLimit to a reasonable value, say 8K. In 99% of cases this will work, but in pathological cases, e.g 16K spaces between the attributes in the declaration, you will need to try again. Thus, have a loop that tries to find the encoding, but if it doesn't find it within the given mark region, it tries again, doubling the requested mark readLimit size.

To be sure you don't advance the input stream past the mark limit, you should read the InputStream yourself, upto the mark limit, into a byte array. You then wrap the byte array in a ByteArrayInputStream and pass that to the constructor of the InputStreamReader assigned to 'readerForEncoding'.


You can use this method to convert inputstream to string. this might help you...

private String convertStreamToString(InputStream input) throws Exception{
    BufferedReader reader = new BufferedReader(new InputStreamReader(input));
    StringBuilder sb = new StringBuilder();
    String line = null;

    while ((line = reader.readLine()) != null) {
        sb.append(line);
    }

    input.close();
    return sb.toString();
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜