Piping unzip in a SAS infile
Suppose I do the following in SAS:
filename tmp pipe 'unzip -c -qq ./data_xml.zip';
libname tmp xml xmlmap=TMMap access=READONLY;
data header; set tmp.header; run;
data owners; set tmp.owners; run;
This will unzip the data_xml.zip
file and use the SAS xmlmap
file to generate two data sets, header
and owners
.
My question is, how many times will unzip run on data_xml.zip
? Wil开发者_如何学JAVAl the unzipping just happen once, or will it happen twice because I'm setting a data set from the tmp
libname twice?
The short answer is, YES, it will unzip it twice.
As I understand it, the unzip -c
essentially turns that data into a sequential source because it is streaming from the unzip command directly into the PIPE
libname.
Presumably, you want to stream via the -c
and the PIPE
because of disk space and/or performance concerns with landing the file to disk first. Unfortunately, I'm fairly certain that the way things are set up, the only way to minimize the CPU of an additional unzip will be to land it to disk on a temporary file first.
However, depending on the size of the file, the CPU hit for a second unzip might not outweigh the I/O hit for having to read an expanded file from disk at least one extra time.
精彩评论