What impacts SAS data set performance more - number of observations or number of variables?
After working with different data sets in SAS for a month or two, it seems to me that the more variables a data set has, the more time it takes to run PROCs and other operations on the data set. Yet if I have for example 5 variables, but 1 million observations, performance is not impacted too much.
While I'm interested in if observations or var开发者_JAVA技巧iables affect performance, I was also wondering if there are other factors I'm missing in looking at SAS performance?
Thanks!
For the same size data set (rows*columns) the one with more variables will usually be slower I believe. I tried creating two data sets with either 1 row and 10000 columns, or 1 column and 10000 rows. The one with more variables took a lot more memory and time.
options fullstimer;
data a;
retain var1-var10000 1;
run;
data b(drop=i);
do i=1 to 10000;
var1=i;
output;
end;
run;
On the Log
31 options fullstimer;
32 data a;
33 retain var1-var10000 1;
34 run;
NOTE: The data set WORK.A has 1 observations and 10000 variables.
NOTE: DATA statement used (Total process time):
real time 0.23 seconds
user cpu time 0.20 seconds
system cpu time 0.03 seconds
Memory 5382k
OS Memory 14208k
Timestamp 10/14/2009 2:03:57 PM
35 data b(drop=i);
36 do i=1 to 10000;
37 var1=i;
38 output;
39 end;
40 run;
NOTE: The data set WORK.B has 10000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
Memory 173k
OS Memory 12144k
Timestamp 10/14/2009 2:03:57 PM
You should also check out BUFNO= and BUFSIZE=. If you have to access a data set many times, you might consider using SASFILE as well to store the entire data set in memory.
I can't quite elucidate (and am making an educated guess), but I imagine it has something to do with a combination of factors, including that a whole record is read into the PDV, which means more data sits in memory with many variables.
It might be worth doing some measurements with compressed datasets, because I/O is often the bottleneck.
SAS dataset option:
data foo(compress=yes);
...
run;
精彩评论