开发者

how to make selecting random rows in oracle faster with table with millions of rows

Is there a way to make selecting random rows faster in oracle with a table 开发者_开发技巧that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.

Thanks!


Using appropriate values of sample(x) is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum = 1

I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum <= 5

    FILENO    BLOCKNO     OFFSET
---------- ---------- ----------
       152    2454936         11
       152    2463140         32
       152    2335208          2
       152    2429207         23
       152    2746125         28

I suspect you should probably tune your SAMPLE clause to use an appropriate sample size for what you're fetching.


Start with Adam's answer first, but if SAMPLE just isn't fast enough, even with the ROWNUM optimization, you can use block samples:

....FROM [table] SAMPLE BLOCK (0.01)

This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.


Here's the same question on AskTom:

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522

If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.

Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861

create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
  for exp_v in -6..2 loop
    exit when (urowid_t.count > 0);
    if (exp_v < 2) then
      sql_v := 'select rowid from ' || table_name
      || ' sample block (' || power(10, exp_v) || ')';
    else
      sql_v := 'select rowid from ' || table_name;
    end if;
    cursor_v := dbms_sql.open_cursor;
    dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
    dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
    status_v := dbms_sql.execute(cursor_v);
    loop
      rows_v := dbms_sql.fetch_rows(cursor_v);
      dbms_sql.column_value(cursor_v, 1, urowid_t);
      exit when rows_v != 100;
    end loop;
    dbms_sql.close_cursor(cursor_v);
  end loop;
  if (urowid_t.count > 0) then
    return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
  end if;
  return null;
exception when others then
  if (dbms_sql.is_open(cursor_v)) then
    dbms_sql.close_cursor(cursor_v);
  end if;
  raise;
end;
/
show errors


Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.

Solution:

Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.

SELECT * FROM ( SELECT * FROM table ORDER BY dbms_random.value ) WHERE rownum = 1

So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"

SELECT * FROM
( SELECT * FROM table
where rownum < 1000
  and status = 'ACTIVE'
  ORDER BY dbms_random.value  )
WHERE rownum = 1

update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.


Someone told sample(x) is the fastest way you can. But for me this method works slightly faster than sample(x) method. It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help

SELECT *
  FROM My_User.My_Table
 WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
                  FROM (SELECT o.Data_Object_Id,
                               e.Relative_Fno,
                               e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
                          FROM Dba_Extents e
                          JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                         WHERE e.Segment_Name = 'MY_TABLE'
                           AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                              (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                      MAX(e.Owner)        AS Owner,
                                      MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                 FROM Dba_Extents e
                                WHERE e.Segment_Name = 'MY_TABLE'
                                  AND e.Owner = 'MY_USER'
                                  AND e.Segment_Type = 'TABLE')) e
                  JOIN My_User.My_Table t
                    ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                   AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))


Version with retries when no rows returned:

WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
                     MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
                FROM (SELECT o.Data_Object_Id,
                             e.Relative_Fno,
                             e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id 
                        FROM Dba_Extents e
                        JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                       WHERE e.Segment_Name = 'MY_TABLE'
                         AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                            (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                    MAX(e.Owner)        AS Owner,
                                    MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                               FROM Dba_Extents e
                              WHERE e.Segment_Name = 'MY_TABLE'
                                AND e.Owner = 'MY_USER'
                                AND e.Segment_Type = 'TABLE')) e
                JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                                                  AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
  Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
                             FROM Dual
                             LEFT JOIN gen ON 1=1
                            UNION ALL
                           SELECT Cnt + 1, gen.Row_Id
                             FROM Retries
                             LEFT JOIN gen ON 1=1
                            WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
  FROM MY_USER.MY_TABLE
 WHERE ROWID = (SELECT Row_Id
                  FROM Retries
                 WHERE Row_Id IS NOT NULL)


Can you use pseudorandom rows?

select * from (
  select * from ... where... order by ora_hash(rowid)
) where rownum<100
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜