开发者

What is the "maximal" size (complexity) of a DB query that is tractable in practice in your RDBMS? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

With the growth of the size of the query, a query to a database can easily become computationally intractable by the RDBMS you use in pratice. So, I suppose, in order to use DBs in practice (do programming with a DB as a backend), you must know where the bound for the complexity/size of an admissible query is.

If you write programs that need to issue complex queries to relational databases, what is the "maximal" size/complexity of the queries that are expected to be effectively answerable by the RDMS you use?

And what is the usual size of the queries posed to relational database systems? How much is it lower than the maximal bound?

The motivation for asking this is the following theoretical speculation: It seems to be known that to find an answer to a query Q over a database D, one needs time |D||Q|, and one cannot get rid of the exponent |Q|. (Looking for a clique is an example of the worst-case que开发者_如何学JAVAry.) As D can be very large in practice, we wonder why database work at all.


For the note, I'd point out an issue in your question: you're assuming you'll always want a precise answer to a query. This is not the case in practice. When mining large amounts of data, an approximation of the answer will be good enough.

In the case of PostgreSQL, I'm not aware of any hard-coded limit to the number of joins, but depending on the transaction isolation level I'd expect to run out of locks long before it's reached.

Queries thrown at an RDBMS, in my experience, have a few joins at most and are written in such a way that they can use indexes. When not, the developer is usually doing something very wrong.

There arguably is the occasional report query that tends to be slower. These might involve much more complicated statements, with dozens of joins and unions and aggregates what not. But in this case a genetic algorithm kicks in, for one. And the planner will, upon reaching collapse limits, respect the join order, making it possible to write the query in an optimal way given advance knowledge on the data's repartition.

I've seem PostgreSQL swallow queries with two dozen joins without a hiccup... More typically, though, it's possible and more efficient to split such queries into smaller, bite-sized chunks; and/or to pre-aggregate some of the results it'll need.

For the row counts on large queries or data sets, running explain and returning the planner's estimate number of rows is usually enough: there's little point in knowing there are exactly 9,992 matching rows.


This is a very good question, in my opinion. In a typical scenario, human queries seem to be small and simple (for instance, contain few cycles, if any), and RDBMSs are really efficient. Now imagine a situation where you formulate your query in a certain vocabulary available to the user, which has to be translated by a computer to the vocabulary of the relational databases (say, on the Web). This is a typical Semantic Web scenario, for which languages like OWL 2 have been designed. In this case, your original query may be small, but the resulting query, posed to an RDBMS, can be exponentially larger.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜