How to split up a massive data query into multiple queries

2023-03-19 13:24 问答作者：

I have to select all rows from a table with millions of rows (to preload a Coherence datagrid.) How do I split up this query into multiple queries 开发者_如何学运维that can be concurrently executed by multiple threads?

I first thought of getting a count of all records and doing:

SELECT ... 
WHERE ROWNUM BETWEEN (packetNo * packetSize) AND ((packetNo + 1) * packetSize)

but that didn't work. Now I'm stuck.

Any help will be very appreciated.

If you have the Enterprise Edition license, the easiest way of achieving this objective is parallel query.

For one-off or ad hoc queries use the PARALLEL hint:

select /*+ parallel(your_table, 4) */  *
from your_table
/

The number in the hint is the number of slave queries you want to execute; in this case the database will run four threads.

If you want every query issued on the table to be parallelizable then permanently alter the table definition:

alter table your_table parallel (degree 4)
/

Note that the database won't always use parallel query; the optimizer will decide whether it's appropriate. Parallel query only works with full table scans or index range scans which cross multiple partitions.

There are a number of caveats. Parallel query requires us to have sufficient cores to satisfy the proposed number of threads; if we only have a single dual-core CPU setting a parallel degree of 16 isn't going to magically speed up the query. Also, we need spare CPU capacity; if the server is already CPU bound then parallel execution is only going to make things worse. Finally, the I/O and storage subsystems need to be capable of satisfying the concurrent demand; SANs can be remarkably unhelpful here.

As always in matters of performance, it is crucial to undertake some benchmarking against realistic volumes of data in a representative environment before going into production.

What if you don't have Enterprise Edition? Well, it is possible to mimic parallel execution by hand. Tom Kyte calls it "Do-It-Yourself Parallelism". I have used this technique myself, and it works well.

The key thing is to work out the total range ROWIDs which apply to the table, and split them across multiple jobs. Unlike some of the other solutions proposed in this thread, each job only selects the rows it needs. Mr Kyte summarized the technique in an old AskTom thread, including the vital split script: find it here.

Splitting the table and starting off threads is a manual task: fine as a one-off but rather tiresome to undertake frequently. So if you are running 11g release 2 you ought to know that there is a new PL/SQL package DBMS_PARALLEL_EXECUTE which automates this for us.

Are you sure a parallel execution of the query will be faster? This will only be the case if the huge table is stored on a disk array with many disks or if it is partitioned over several disk. In all other cases, a sequential access of the table will be many times faster.

If you really have to split the query, you have to split it in a way so that a sequential access for each part is still possible. Please post the DLL of the table so we can give a specific answer.

If the processing of the data or the loading into the data grid is the bottleneck, then you are better of reading the data with a single process and the splitting the data before futher processing it.

Assuming that reading is fast and further data processing is the bottleneck, you could for exmaple read the data and write it into very simple text files (such a fixed length or CSV). After every 10,000 rows you start a new file and spawn a thread or process to process the just finished file.

try with something like this:

select * from 
( select a.*, ROWNUM rnum from 
  ( <your_query_goes_here, with order by> ) a 
  where ROWNUM <= :MAX_ROW_TO_FETCH )
where rnum  >= :MIN_ROW_TO_FETCH;

Have you considered using MOD 10 on ROWNUM to pull the data one tenth at a time?

SELECT A.*
FROM Table A
WHERE MOD(ROWNUM,10) = 0;

继续阅读：oracle select sql

How to split up a massive data query into multiple queries

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？