concurrent queries on separate tables (same database) in PostgreSQL
Could anyone please help me in understanding the internals of PostgreSQL? If I access two separate tables (same database) from two clients, the query response time increases as compared to when I execute queries on separate tables at different times. I am confused how the simultaneous queries on different tables are resulting into longer execution time. In theory, if I perform queries s开发者_如何学Goimultaneously on different tables, tables are unlocked for both clients and execution time should remain the same.
I am wondering if there is any overhead due to usage of common shared resources because the database is same.
Please help!
Well, let's take the most basic case.
You have two, large, tables stored on the disk.
You them run two queries against those two separate tables.
But, while they're two separate logical tables as represented by the DB, they are in fact sharing the same disk.
So, as each individual postgres process reads each table, the disk head is skating and dancing back and forth trying to serve up the data, and thus each process is fighting with each other on the disk drive, the single shared resource.
It's like have a bank with two teller windows and two lines, but only one teller in the back doing all of the work.
This doesn't take in to account any of the other zillion factors that could be coming in to play to slow or speed your query. Just the a basic example of one case that could be happening.
There are a bunch of things to keep in mind here.
First, there is a huge difference between sequential and random disk I/O. With random disk I/O you don't get the operating system to help with pre-fetch, but you do with sequential access. So in a sequential read the operating system will pull in much more data than we initially read and we will get to it while the operating system pulls data from the other table for the other query. In random access, you get the effect described, but in random access, you get to wait for disk heads to move anyway without the concurrency issue because well, you are doing random reads.
The second thing to keep in mind is that different query plans have different I/O profiles. If we are pulling, say, 10% of the pages of the table off of disk, and we have an index, we might choose to sequentially load the index and then traverse in logical order to find our records (while the disk is doing stuff for the other query), and then accept the overhead to just pull a few pages random-access from the disk. This of course involves this sort of waiting for heads to move, and there may or may not be concurrency issues that may make that worse. So it is entirely possible that query 1 will do an index scan and query 2 will do a sequential scan, and will pull most of the data needed while query 1 is processing the index. Then maybe you get some of of the issues Will describes, but probably not much.
The final thing, and this is really critical, is caching. Databases tend to cache a lot of records because this avoids the disk I/O altogether. So in this case, you may actually have something very different. Query 1 might work from memory or mostly from memory while query 2 might hit the disk. In general if you have enough memory that most of the data you are interested in fits in memory comfortably, with room to spare for other pieces of software like the kernel, then none of the disk I/O problems that are possible will be actual problems, and the only time you will be typically hitting disk will be to commit the WAL segments.
So the answer is "it depends." It depends on your system. It depends on your RAM and data size. It depends on your hard drives and operating system. It depends on the specific queries. It depends on other use patterns. It depends.
精彩评论