Does SAS Proc SQL ever use an index when merging

2022-12-08 11:36 问答作者：

Consider the following (admittedly long) example.

The sample code creates two data sets, data one with "key" variables i,j,k and data two with key variables j,k and a "value" variable x. I'd like to merge these two data sets as efficiently as possible. Both of the data sets are indexed with respect to j and k: the index for the first data should not be needed but it's there anyway.

Proc SQL does not use the index in the data two, which I suppose would be the case if the data were in a relational database. Is this just a limitation of the query optimizer I have to accept?

EDIT: The answer to this question is yes, SAS can use an index to optimize a PROC SQL join. In the following example, the relative sizes of the data sets matters: If you modify the code so that data two becomes relatively larger than data one, index will be used. Whether the data sets are sorted or not, does not matter.

* Just to control the size of the data;
%let j_max=10000;

* Create data sets;
data one;
    do i=1 to 3;
        do j=1 to &j_max;
            do k=1 to 4;
                if ranuni(0)<0.9 then output;
            end;
        end;
    end;
run;

data two;
    do j=1 to &j_max;
        do k=1 to 4;
            x=ranuni(0);
            if ranuni(0)<0.9 then output;
        end;
    end;
run;

* Create indices;
proc datasets library=work nolist;
    modify one;
    index create idx_j_k=(j 开发者_如何学运维k);
    modify two;
    index create idx_j_k=(j k) / unique;
run;quit;

* Test the use of an index for the other data set:
* Log should display "INFO: Index idx_j_k selected for WHERE clause optimization.";
options msglevel=i;
data _null_;
    set two(where=(j<100));
run;

* Merge the data sets with proc sql - no index is used;
proc sql;
    create table onetwo as
    select
        one.*,
        two.x
    from one, two
    where
        one.j=two.j and
        one.k=two.k;
quit;

You may be comparing apples and oranges. For the join you do with proc sql, the index may not help because the observations are already ordered by j and k and there are faster ways to do "merging" than using indices.

For the subsetting you do with the data _null_ step, on the other hand, an index on j would surely help. If you do the same subsetting with the proc sql, you will see that it is using the index.

proc sql;
  select * from two where j < 100;
quit;
/* on log
INFO: Index idx_j_k selected for WHERE clause optimization.
*/

By the way, you can use the undocumented _method option to examine how proc sql executes your query. On my sas 9.2 on windows, it reports that it is doing what is called a "hash join":

proc sql _method;
  create table onetwo as
  select
    one.*,
    two.x
  from one, two
  where
    one.j=two.j and
    one.k=two.k;
quit;

/* on log
NOTE: SQL execution methods chosen are:

  sqxcrta
      sqxjhsh
          sqxsrc( WORK.ONE )
          sqxsrc( WORK.TWO )
*/

See Paul Kent's Tech note for more information .

继续阅读：indexing performance sas sql

Does SAS Proc SQL ever use an index when merging

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？