开发者

What impacts SAS data set performance more - number of observations or number of variables?

After working with different data sets in SAS for a month or two, it seems to me that the more variables a data set has, the more time it takes to run PROCs and other operations on the data set. Yet if I have for example 5 variables, but 1 million observations, performance is not impacted too much.

While I'm interested in if observations or var开发者_JAVA技巧iables affect performance, I was also wondering if there are other factors I'm missing in looking at SAS performance?

Thanks!


For the same size data set (rows*columns) the one with more variables will usually be slower I believe. I tried creating two data sets with either 1 row and 10000 columns, or 1 column and 10000 rows. The one with more variables took a lot more memory and time.

options fullstimer;
data a;
    retain var1-var10000 1;
run;
data b(drop=i);
    do i=1 to 10000;
    var1=i;
    output;
    end;
run;

On the Log

31   options fullstimer;
32   data a;
33       retain var1-var10000 1;
34   run;

NOTE: The data set WORK.A has 1 observations and 10000 variables.
NOTE: DATA statement used (Total process time):
      real time           0.23 seconds
      user cpu time       0.20 seconds
      system cpu time     0.03 seconds
      Memory                            5382k
      OS Memory                         14208k
      Timestamp            10/14/2009  2:03:57 PM


35   data b(drop=i);
36       do i=1 to 10000;
37       var1=i;
38       output;
39       end;
40   run;

NOTE: The data set WORK.B has 10000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      Memory                            173k
      OS Memory                         12144k
      Timestamp            10/14/2009  2:03:57 PM

You should also check out BUFNO= and BUFSIZE=. If you have to access a data set many times, you might consider using SASFILE as well to store the entire data set in memory.


I can't quite elucidate (and am making an educated guess), but I imagine it has something to do with a combination of factors, including that a whole record is read into the PDV, which means more data sits in memory with many variables.

It might be worth doing some measurements with compressed datasets, because I/O is often the bottleneck.

SAS dataset option:

data foo(compress=yes);
...
run;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜