Decompress gz file using R
I have used ?unzip
in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here.
I have tried ?gzfile
and ?gzcon
but have开发者_开发百科 not been able to get it to work. Any help you can provide will be greatly appreciated.
Here is a worked example that may help illustrate what gzfile()
and gzcon()
are for
foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
foo
# a b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
write.table(foo, file="/tmp/foo.csv")
system("gzip /tmp/foo.csv") # being very explicit
Now that the file is written, instead of implicit use of file()
, use gzfile()
:
read.table(gzfile("/tmp/foo.csv.gz"))
# a b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.
To un-gz a file in R you can do
library(R.utils)
gunzip("file.gz", remove=FALSE)
or
gunzip("file.gz")
But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.
If you really want to uncompress the file, just use the untar
function which does support gzip.
E.g.:
untar('chadwick-0.5.3.tar.gz')
http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html
R added transparent decompression for certain kinds of compressed files in the latest version (2.10). If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.
The command...
myData <- read.table('myFile.gz')
#gzip compressed files have a "gz" extension
Will work just as if 'myFile.gz' were the raw text file.
library(vroom)
columns3 = c('A', 'B',...) ## define column names
Data1<- vroom(".../XXX.tsv",col_names = columns3)
works fine with tsv.gz
If it's a comma/tab-separated file, you can use data.table's fread()
. It can handle zipped (.zip, .gz) files:
fread('myFile.csv.gz')
精彩评论