开发者

GZIPInputStream to String

I am attempting to convert the gzipped body of a HTTP response to plaintext. I've taken the byte array of this response and converted it to a ByteArrayInputStream. I've then converted this to a GZIPInputStream. I now want to read the GZIPInputStream and store the final decompressed HTTP response body as a plaintext String.

This 开发者_如何学Pythoncode will store the final decompressed contents in an OutputStream, but I want to store the contents as a String:

public static int sChunk = 8192;
ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
byte[] buffer = new byte[sChunk];
int length;
while ((length = gzis.read(buffer, 0, sChunk)) != -1) {
        out.write(buffer, 0, length);
}


To decode bytes from an InputStream, you can use an InputStreamReader. Then, a BufferedReader will allow you to read your stream line by line.

Your code will look like:

ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);

String readed;
while ((readed = in.readLine()) != null) {
    System.out.println(readed);
}


You should rather have obtained the response as an InputStream instead of as byte[]. Then you can ungzip it using GZIPInputStream and read it as character data using InputStreamReader and finally write it as character data into a String using StringWriter.

String body = null;
String charset = "UTF-8"; // You should determine it based on response header.

try (
    InputStream gzippedResponse = response.getInputStream();
    InputStream ungzippedResponse = new GZIPInputStream(gzippedResponse);
    Reader reader = new InputStreamReader(ungzippedResponse, charset);
    Writer writer = new StringWriter();
) {
    char[] buffer = new char[10240];
    for (int length = 0; (length = reader.read(buffer)) > 0;) {
        writer.write(buffer, 0, length);
    }
    body = writer.toString();
}

// ...

See also:

  • Java IO tutorial
  • How to use URLConnecion to fire/handle HTTP requests

If your final intent is to parse the response as HTML, then I strongly recommend to just use a HTML parser for this like Jsoup. It's then as easy as:

String html = Jsoup.connect("http://google.com").get().html();


Use the try-with-resources idiom (which automatically closes any resources opened in try(...) on exit from the block) to make code cleaner.

Use Apache IOUtils to convert inputStream to String using default CharSet.

import org.apache.commons.io.IOUtils;
public static String gzipFileToString(File file) throws IOException {
    try(GZIPInputStream gzipIn = new GZIPInputStream(new FileInputStream(file))) {
        return IOUtils.toString(gzipIn);
    }
}


Use Apache Commons to convert GzipInputStream to byteArray.

import java.io.InputStream;
import java.util.zip.GZIPInputStream;
import org.apache.commons.io.IOUtils;

public static byte[] decompressContent(byte[] pByteArray) throws IOException {
        GZIPInputStream gzipIn = null;
        try {
            gzipIn = new GZIPInputStream(new ByteArrayInputStream(pByteArray));
            return IOUtils.toByteArray(gzipIn);
        } finally {
            if (gzipIn != null) {
                gzipIn.close();
            }
        }

To convert byte array uncompressed content to String, do something like this :

String uncompressedContent = new String(decompressContent(inputStream));


You can use the StringWriter to write to String


GZipwiki is a file format and a software application used for file compression and decompression. gzip is a single-file/stream lossless data compression utility, where the resulting compressed file generally has the suffix .gz

String(Plain) ➢ Bytes ➤ GZip-Data(Compress) ➦ Bytes ➥ String(Decompress)

String zipData = "Hi Stackoverflow and GitHub";
        
// String to Bytes
byte[] byteStream = zipData.getBytes();
System.out.println("String Data:"+ new String(byteStream, "UTF-8"));

// Bytes to Compressed-Bytes then to String.
byte[] gzipCompress = gzipCompress(byteStream);
String gzipCompressString = new String(gzipCompress, "UTF-8");
System.out.println("GZIP Compressed Data:"+ gzipCompressString);

// Bytes to DeCompressed-Bytes then to String.
byte[] gzipDecompress = gzipDecompress(gzipCompress);
String gzipDecompressString = new String(gzipDecompress, "UTF-8");
System.out.println("GZIP Decompressed Data:"+ gzipDecompressString);

GZip-Bytes(Compress) ➥ File (*.gz) ➥ String(Decompress)

GZip Filename extension .gz and Internet media type is application/gzip.

GZIPInputStream to String

File textFile = new File("C:/Yash/GZIP/archive.gz.txt");
File zipFile = new File("C:/Yash/GZIP/archive.gz");
org.apache.commons.io.FileUtils.writeByteArrayToFile(textFile, byteStream);
org.apache.commons.io.FileUtils.writeByteArrayToFile(zipFile, gzipCompress);

FileInputStream inStream = new FileInputStream(zipFile);
byte[] fileGZIPBytes = IOUtils.toByteArray(inStream);
byte[] gzipFileDecompress = gzipDecompress(fileGZIPBytes);
System.out.println("GZIPFILE Decompressed Data:"+ new String(gzipFileDecompress, "UTF-8"));

Following functions are used for compression and decompression.

public static byte[] gzipCompress(byte[] uncompressedData) {
    byte[] result = new byte[]{};
    try (
        ByteArrayOutputStream bos = new ByteArrayOutputStream(uncompressedData.length);
        GZIPOutputStream gzipOS = new GZIPOutputStream(bos)
        ) {
        gzipOS.write(uncompressedData);
        gzipOS.close(); // You need to close it before using ByteArrayOutputStream
        result = bos.toByteArray();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return result;
}

public static byte[] gzipDecompress(byte[] compressedData) {
    byte[] result = new byte[]{};
    try (
        ByteArrayInputStream bis = new ByteArrayInputStream(compressedData);
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        GZIPInputStream gzipIS = new GZIPInputStream(bis)
        ) {
        //String gZipString= IOUtils.toString(gzipIS);
        byte[] buffer = new byte[1024];
        int len;
        while ((len = gzipIS.read(buffer)) != -1) {
            bos.write(buffer, 0, len);
        }
        result = bos.toByteArray();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return result;
}


import java.io.*;
import java.util.zip.*;

public class Ex1 {

    public static void main(String[] args) throws Exception{
        String str ;

        H h1 = new H();
        h1.setHcfId("PH12345658");
        h1.setHcfName("PANA HEALTH ACRE FACILITY");

        str = h1.toString();
        System.out.println(str);

        if (str == null || str.length() == 0) {
            return ;
        }
        ByteArrayOutputStream out = new ByteArrayOutputStream(str.length());
        GZIPOutputStream gzip = new GZIPOutputStream(out);
        gzip.write(str.getBytes());
        gzip.close();
        out.close();

        String s =  out.toString() ;
        System.out.println( s );
        byte[] ba = out.toByteArray();
        System.out.println( "---------------BREAK-------------" );

        ByteArrayInputStream in = new ByteArrayInputStream(ba);
        GZIPInputStream gzis = new GZIPInputStream(in);
        InputStreamReader reader = new InputStreamReader(gzis);
        BufferedReader pr = new BufferedReader(reader);

        String readed;
        while ((readed = pr.readLine()) != null) {
            System.out.println(readed);
        }

        //Close all the streams
    }

}


you can also do

try (GZIPInputStream gzipIn = new GZIPInputStream(new ByteArrayInputStream(pByteArray)))
{
....
}

AutoClosable is a good thing https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜