How do i get a filename of a file inside a gzip in java?
int BUFFER_SIZE = 4096;
byte[] buffer = new byte[BUFFER_SIZE];
InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz"));
OutputStream output = new FileOutputStream("current_output_name");
int n = input.read(buffer, 0, BUFFER_SIZE);
while (n >= 0) {
output.write(buffer, 0, n);
n 开发者_StackOverflow社区= input.read(buffer, 0, BUFFER_SIZE);
}
}catch(IOException e){
System.out.println("error: \n\t" + e.getMessage());
}
Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name
(I know its because I declared it to be that way in the code). My problem is I dont know how to get the file's filename when it is still inside the archive.
Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files. Any alternatives?
as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk
they skip reading the file name
// Skip optional file name
if ((flg & FNAME) == FNAME) {
while (readUByte(in) != 0) ;
}
i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) (download the original version from here), you only need to add a member String filename; to the class, and modify the above code to be :
// Skip optional file name
if ((flg & FNAME) == FNAME) {
filename= "";
int _byte = 0;
while ((_byte= readUByte(in)) != 0){
filename += (char)_byte;
}
}
and it worked for me.
Apache Commons Compress offers two options for obtaining the filename:
With metadata (Java 7+ sample code)
try ( //
GzipCompressorInputStream gcis = //
new GzipCompressorInputStream( //
new FileInputStream("a_gunzipped_file.gz") //
) //
) {
String filename = gcis.getMetaData().getFilename();
}
With "the convention"
String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz");
References
- Apache Commons Compress
- GzipCompressorInputStream
- See also: GzipUtils#getUnCompressedFilename
Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. Including a member with the FLAG of FLAG.FNAME the name can be specified. I do not see a way to do this in the java libraries though.
http://www.gzip.org/zlib/rfc-gzip.html#specification
following the answers above, here is an example that creates a file "myTest.csv.gz" that contains a file "myTest.csv", notice that you can't change the internal file name, and you can't add more files into the gz file.
@Test
public void gzipFileName() throws Exception {
File workingFile = new File( "target", "myTest.csv.gz" );
GZIPOutputStream gzipOutputStream = new GZIPOutputStream( new FileOutputStream( workingFile ) );
PrintWriter writer = new PrintWriter( gzipOutputStream );
writer.println("hello,line,1");
writer.println("hello,line,2");
writer.close();
}
Gzip is purely compression. There is no archive, it's just the file's data, compressed.
The convention is for gzip to append .gz
to the filename, and for gunzip to remove that extension. So, logfile.txt
becomes logfile.txt.gz
when compressed, and again logfile.txt
when it's decompressed. If you rename the file, the name information is lost.
精彩评论