=\'Sat Jan 01 00:00:00 EST 2011\' and \"DATETIME\"<=\'Th" />
开发者

Why do my JDBC cursors perform so slowly?

I'm following verbatim the code outlined in this page (Example 5.2).

My query is quite simple

SELECT * FROM "LSERAW" WHERE "DATETIME">='Sat Jan 01 00:00:00 EST 2011' and "DATETIME"<='Thu Jun 30 00:00:00 EST 2011' ORDER BY "DATETIME" LIMIT 10000000

The database is indexed on DATETIME. However when I increase LIMIT from 10 million to 100 million my c开发者_运维技巧ode takes an extremely long time to enter the while loop. At 10 million it is very quick. I thought that using cursors the while loop should always begin in a timely fashion. Am I doing something wrong with my code?

The underlying database is PostgreSQL on Windows.


Your JDBC Connection autocommit is set to true. The driver is loading the entire result set in to RAM first.

Change autocommit to false, and it'll come back much more quickly.


ORIGINAL ATTEMPT (probably wrong, see edits below): I'm not an expert with cursors, but I'm pretty sure that using a cursor does not mean that the SELECT statement will return right away. The server still has to execute it like any other query, which can take time. I recommend trying to figure out which part of the system is blocking.

After the Java program "hangs", run SELECT * FROM pg_stat_activity using a separate database client (e.g. psql). If you see your query in the list, then it means the server is still crunching it.

You can also change the PostgreSQL server's log_min_duration_statement to (as an example) 1000. Then, after re-starting PostgreSQL (or using pg_ctl reload or something like that), run the program again. Once the query completes, you should see a line in the PostgreSQL log file indicating how long it took.

Good luck!


EDIT: I found this blog entry which describes a similar problem.

[The query] took several seconds to come back, and so created a negative initial experience for the user. What I've learned since then is that I needed to simply apply some environment settings.

Here they are:

set enable_sort = off
set enable_seqscan = off

You can look these up. They simply prevent PostgreSQL from trying to do a filesort or filemerge or sequential scan if there is any index present. Now, outside of a cursor, you will still get a significant delay, because all rows will be delivered regardless. There must be a client-side methodology for paging, or else the server would not be able to deliver the data in chunks. Capice?

But within a cursor, the above works pretty great. Almost instantaneous.

I don't really have any tables laying around with 100 million rows, so I can't test this. But you may want to give it a try.


EDIT 2: Section 11.4 of the PostgreSQL manual explains why enable_sort specifically matters so much:

An important special case is ORDER BY in combination with LIMIT n: an explicit sort will have to process all the data to identify the first n rows, but if there is an index matching the ORDER BY, the first n rows can be retrieved directly, without scanning the remainder at all.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜