What impacts SAS data set performance more - number of observations or number of variables?

2022-12-08 06:01 问答作者：

After working with different data sets in SAS for a month or two, it seems to me that the more variables a data set has, the more time it takes to run PROCs and other operations on the data set. Yet if I have for example 5 variables, but 1 million observations, performance is not impacted too much.

While I'm interested in if observations or var开发者_JAVA技巧iables affect performance, I was also wondering if there are other factors I'm missing in looking at SAS performance?

Thanks!

For the same size data set (rows*columns) the one with more variables will usually be slower I believe. I tried creating two data sets with either 1 row and 10000 columns, or 1 column and 10000 rows. The one with more variables took a lot more memory and time.

options fullstimer;
data a;
    retain var1-var10000 1;
run;
data b(drop=i);
    do i=1 to 10000;
    var1=i;
    output;
    end;
run;

On the Log

31   options fullstimer;
32   data a;
33       retain var1-var10000 1;
34   run;

NOTE: The data set WORK.A has 1 observations and 10000 variables.
NOTE: DATA statement used (Total process time):
      real time           0.23 seconds
      user cpu time       0.20 seconds
      system cpu time     0.03 seconds
      Memory                            5382k
      OS Memory                         14208k
      Timestamp            10/14/2009  2:03:57 PM


35   data b(drop=i);
36       do i=1 to 10000;
37       var1=i;
38       output;
39       end;
40   run;

NOTE: The data set WORK.B has 10000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      Memory                            173k
      OS Memory                         12144k
      Timestamp            10/14/2009  2:03:57 PM

You should also check out BUFNO= and BUFSIZE=. If you have to access a data set many times, you might consider using SASFILE as well to store the entire data set in memory.

I can't quite elucidate (and am making an educated guess), but I imagine it has something to do with a combination of factors, including that a whole record is read into the PDV, which means more data sits in memory with many variables.

It might be worth doing some measurements with compressed datasets, because I/O is often the bottleneck.

SAS dataset option:

data foo(compress=yes);
...
run;

继续阅读：dataset performance sas

What impacts SAS data set performance more - number of observations or number of variables?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？