What is the "maximal" size (complexity) of a DB query that is tractable in practice in your RDBMS? [closed]

2023-03-07 19:25 问答作者：

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

With the growth of the size of the query, a query to a database can easily become computationally intractable by the RDBMS you use in pratice. So, I suppose, in order to use DBs in practice (do programming with a DB as a backend), you must know where the bound for the complexity/size of an admissible query is.

If you write programs that need to issue complex queries to relational databases, what is the "maximal" size/complexity of the queries that are expected to be effectively answerable by the RDMS you use?

And what is the usual size of the queries posed to relational database systems? How much is it lower than the maximal bound?

The motivation for asking this is the following theoretical speculation: It seems to be known that to find an answer to a query Q over a database D, one needs time |D|^|Q|, and one cannot get rid of the exponent |Q|. (Looking for a clique is an example of the worst-case que开发者_如何学JAVAry.) As D can be very large in practice, we wonder why database work at all.

For the note, I'd point out an issue in your question: you're assuming you'll always want a precise answer to a query. This is not the case in practice. When mining large amounts of data, an approximation of the answer will be good enough.

In the case of PostgreSQL, I'm not aware of any hard-coded limit to the number of joins, but depending on the transaction isolation level I'd expect to run out of locks long before it's reached.

Queries thrown at an RDBMS, in my experience, have a few joins at most and are written in such a way that they can use indexes. When not, the developer is usually doing something very wrong.

There arguably is the occasional report query that tends to be slower. These might involve much more complicated statements, with dozens of joins and unions and aggregates what not. But in this case a genetic algorithm kicks in, for one. And the planner will, upon reaching collapse limits, respect the join order, making it possible to write the query in an optimal way given advance knowledge on the data's repartition.

I've seem PostgreSQL swallow queries with two dozen joins without a hiccup... More typically, though, it's possible and more efficient to split such queries into smaller, bite-sized chunks; and/or to pre-aggregate some of the results it'll need.

For the row counts on large queries or data sets, running explain and returning the planner's estimate number of rows is usually enough: there's little point in knowing there are exactly 9,992 matching rows.

This is a very good question, in my opinion. In a typical scenario, human queries seem to be small and simple (for instance, contain few cycles, if any), and RDBMSs are really efficient. Now imagine a situation where you formulate your query in a certain vocabulary available to the user, which has to be translated by a computer to the vocabulary of the relational databases (say, on the Web). This is a typical Semantic Web scenario, for which languages like OWL 2 have been designed. In this case, your original query may be small, but the resulting query, posed to an RDBMS, can be exponentially larger.

继续阅读：complexity-theory database relational-database

What is the "maximal" size (complexity) of a DB query that is tractable in practice in your RDBMS? [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？