how to make selecting random rows in oracle faster with table with millions of rows

2023-01-05 10:28 问答作者：

Is there a way to make selecting random rows faster in oracle with a table 开发者_开发技巧that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.

Thanks!

Using appropriate values of sample(x) is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum = 1

I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum <= 5

    FILENO    BLOCKNO     OFFSET
---------- ---------- ----------
       152    2454936         11
       152    2463140         32
       152    2335208          2
       152    2429207         23
       152    2746125         28

I suspect you should probably tune your SAMPLE clause to use an appropriate sample size for what you're fetching.

Start with Adam's answer first, but if SAMPLE just isn't fast enough, even with the ROWNUM optimization, you can use block samples:

....FROM [table] SAMPLE BLOCK (0.01)

This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.

Here's the same question on AskTom:

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522

If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.

Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861

create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
  for exp_v in -6..2 loop
    exit when (urowid_t.count > 0);
    if (exp_v < 2) then
      sql_v := 'select rowid from ' || table_name
      || ' sample block (' || power(10, exp_v) || ')';
    else
      sql_v := 'select rowid from ' || table_name;
    end if;
    cursor_v := dbms_sql.open_cursor;
    dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
    dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
    status_v := dbms_sql.execute(cursor_v);
    loop
      rows_v := dbms_sql.fetch_rows(cursor_v);
      dbms_sql.column_value(cursor_v, 1, urowid_t);
      exit when rows_v != 100;
    end loop;
    dbms_sql.close_cursor(cursor_v);
  end loop;
  if (urowid_t.count > 0) then
    return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
  end if;
  return null;
exception when others then
  if (dbms_sql.is_open(cursor_v)) then
    dbms_sql.close_cursor(cursor_v);
  end if;
  raise;
end;
/
show errors

Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.

Solution:

Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.

SELECT * FROM ( SELECT * FROM table ORDER BY dbms_random.value ) WHERE rownum = 1

So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"

SELECT * FROM
( SELECT * FROM table
where rownum < 1000
  and status = 'ACTIVE'
  ORDER BY dbms_random.value  )
WHERE rownum = 1

update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.

Someone told sample(x) is the fastest way you can. But for me this method works slightly faster than sample(x) method. It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help

SELECT *
  FROM My_User.My_Table
 WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
                  FROM (SELECT o.Data_Object_Id,
                               e.Relative_Fno,
                               e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
                          FROM Dba_Extents e
                          JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                         WHERE e.Segment_Name = 'MY_TABLE'
                           AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                              (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                      MAX(e.Owner)        AS Owner,
                                      MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                 FROM Dba_Extents e
                                WHERE e.Segment_Name = 'MY_TABLE'
                                  AND e.Owner = 'MY_USER'
                                  AND e.Segment_Type = 'TABLE')) e
                  JOIN My_User.My_Table t
                    ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                   AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))

Version with retries when no rows returned:

WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
                     MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
                FROM (SELECT o.Data_Object_Id,
                             e.Relative_Fno,
                             e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id 
                        FROM Dba_Extents e
                        JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                       WHERE e.Segment_Name = 'MY_TABLE'
                         AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                            (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                    MAX(e.Owner)        AS Owner,
                                    MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                               FROM Dba_Extents e
                              WHERE e.Segment_Name = 'MY_TABLE'
                                AND e.Owner = 'MY_USER'
                                AND e.Segment_Type = 'TABLE')) e
                JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                                                  AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
  Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
                             FROM Dual
                             LEFT JOIN gen ON 1=1
                            UNION ALL
                           SELECT Cnt + 1, gen.Row_Id
                             FROM Retries
                             LEFT JOIN gen ON 1=1
                            WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
  FROM MY_USER.MY_TABLE
 WHERE ROWID = (SELECT Row_Id
                  FROM Retries
                 WHERE Row_Id IS NOT NULL)

Can you use pseudorandom rows?

select * from (
  select * from ... where... order by ora_hash(rowid)
) where rownum<100

继续阅读：oracle

how to make selecting random rows in oracle faster with table with millions of rows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？