开发者

Automatic reformulation of condition in PostgreSQL view

I have a table of the following form:

[mytable]
id,    min,     max,    funobj
-----------------------------------------
1      15       23      {some big object}
1      23       41      {another big object}
1      19       27      {next big object}

Now suppose I have a view created like this:

CREATE VIEW functionval开发者_运维问答ues AS
SELECT id, evaluate(funobj)
FROM mytable

where evaluate is a set-returning function evaluateing the large funobj. The result of the view could be something like this:

id,  evaluate
--------------
1    15
1    16
1    ...
1    23
2    23
2    24
2    ...
2    41
...

I do not have any information on the specific values evaluate will return, but I know that they will alway be between the min- and max-values given in mytable (including boundaries)

Finally, I (or better, a third party application) makes a query on the view:

SELECT * FROM functionvalues 
WHERE evaluate BETWEEN somevalue AND anothervalue

In this case, Postgres does evaluate the function evaluate for every row in mytable, whereas, depending on the where clause, the function does not have to be evaluated if it's max and min are not between the given values. As evaluate is a rather slow function, this gives me a very bad performance.

The better way would be to query the table directly by using

SELECT *
FROM (
    SELECT id, evaluate(funobj)
    FROM mytable
    WHERE
       max BETWEEN somevalue AND anothervalue
       OR min BETWEEN somevalue AND anothervalue
       OR (min < somevalue AND max > anothervalue)
) AS innerquery
WHERE evaluate BETWEEN somevalue AND anothervalue

Is there any way I can tell postgres to use a query like the above (by clever indices or something like that) without changing the way the third party application queries the view?

P.S.: Feel free to suggest a better title to this question, the one I gave is rather... well... unspecific.


I have no complete answer, but some of your catchwords ring a distant bell in my head:

  • you have a view
  • you want a more intelligent view
  • you want to "rewrite" the view definition

That calls for the PostgreSQL Rule System, especially the part "Views and the Rules System". Perhaps you can use that for your advantage.

Be warned: This is treacherous stuff. First you will find it great, then you will pet it, then it will rip of your arm without a warning while still purring. Follow the links in here.


Postgres cannot push the restrictions down the query tree into the function; the function always has to scan and return the entire underlying table. And rejoin it with the same table. sigh. "Breaking up" the function's body and combining it with the rest of the query would require a macro-like feature instead of a function.

A better way would probably be to not use an unrestricted set-returning function, but to rewrite the function as a a scalar function, taking only one data row as an argument, and yielding its value.

There is also the problem of sorting-order: the outer query does not know about the order delivered by the function, so explicit sort and merge steps will be necessary, except maybe for very small result sets (for function results statistics are not available, only the cost and estimated rowcount, IIRC.)


Sometimes, the right answer is "faster hardware". Given the way the PostgreSQL optimizer works, your best bet might be to move the table and its indexes onto a solid state disk.

Documentation for Tablespaces

Second, tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.

In October, 2011, you can get a really good 128 gig SSD drive for less than $300, or 300 gigs for less than $600.

If you're looking for an improvement of two orders of magnitude, and you already know the bottleneck is your evaluate() function, then you will probably have to accept smaller gains from many sources. If I were you, I'd look at whether any of these things might help.

  • solid-state disk (speedup factor of 50 over HDD, but you say you're not IO bound, so let's estimate "2")
  • faster CPU (speedup of 1.2)
  • more RAM (speedup of 1.02)
  • different algorithms in evaluate() (say, 2)
  • different data structures in evaluate() (in a database function, probably 0)
  • different C compiler optimizations (0)
  • different C compiler (1.1)
  • rewrite critical parts of evaluate() in assembler (2.5)
  • different dbms platform
  • different database technology

Those estimates suggest a speedup by a factor of only 13. (But they're little more than guesswork.)

I might even consider targeting the GPU for calculation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜