开发者

Why don't the mainstream DBMSs have graph functionality?

Relational databases are frequently used to store graphs in all their many flavors (trees, directed graphs, undirected graphs, ...).

Why then do none of the major DBMSs (Microsoft, MySql, Oracle, PostgreSQL, SqlLite, just to name a few in alphabetical order) include library support for treating relations as graphs?

Some desirable features, by way of example:

  • Constraint checking (connectedness, acyclicity, planarity, ...)
  • Commonly needed functions (shortest path, minimu开发者_开发百科m spanning tree, transitive closure, max flow/min cut, clique detection, Hamiltonian/Eulerian cycles ...)
  • Auxiliary data structures needed to improve performance for any of the above

Building support for some of these things outside the database is complicated because (among other reasons):

  • It's inherently complicated (libraries help here)
  • Short answers are often supported by lots of data: an external client running a shortest path algorithm would need to either be very "chatty" with the database or would need to retrieve a much-larger-than-needed amount of data; either choice is bad for the network
  • Maintaining integrity when integrity depends on a graph-theoretic constraint requires access to all proposed updates, hence a trigger, and access to existing graph libraries from triggers is complicated in many systems
  • The DBMS storage manager and optimizer are uniquely positioned to address the question of auxiliary data structures, as they do with indexes

This isn't a rhetorical question, I actually want to know if there are interesting technical (or historical) reasons.


I've worked in a research group, interested among others in delevoping a database for RDF(S) data, which is basically labeled graphs, or triples [subject, predicate, object], which are basically graph edges: [sourceNode, edgeLabel, targetNode].

The question to ask, to appreciate the hardness of the problem: What kind of indices are you going to build for a labeled graph? You have to take advantage of common "properties" (each "predicate" is a property of the subject, with the value of object), and index edges accordingly, so you can quickly find if "there is an edge called 'hasAge' on a Person with value greater than 18".

For illustration, here is a simple approach, which is schema-oblivious (and goes quite to the opposite direction of traditional database research which quite unanimously agrees that schemata are good to have). It completely ignores any schema information (this paper provides useful context). Just store everything in three big tables (s: subject, p: predicate, o: object):

  1. [s, p, o]
  2. [p, o, s]
  3. [o, s, p]

These three suffice to answer any efficiently evaluate any query with (at most) a subject, (at most) a predicate, and (at most) an object (i.e. queries of the form (s, *, *), (*, p, *), (*, *, o), (s, p, *), (s, *, o), (*, p, o), (s, p, o)). Complex queries though consist of many "path expressions" (i.e. you describe data for which you can find certain paths satisfying some criteria), each of which is translated to a self-join on one of these (big!) tables, which is not all that efficient, which is a problem.

There, that's a simple graph database in a pocket. :)

Concluding, this is the field of active research. I'm not up to date with the current state of art, but I've seen products like AllegroGraph and others that claim very good results.


Oracle has support for graph functionality (Oracle Locator/Oracle Spatial) and semantic web functionality.


I suspect that your question contains the beginnings of its own answer.

The commonly needed functions you list are not, for general purpose databases, commonly needed at all. Yes they are certainly needed for graph operations, but rarely for, say, customer billing. Of course, a relational database can store graphs in tables but the graph operations are beyond the powers of any version of SQL I've seen.

You write building support for some of these things outside the database is complicated. True that and it's why we all get paid so much. But it would be just as complicated to build support for those things into the database wouldn't it ?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜