开发者

query language for graph sets: data modeling question

Suppose I have a set of directed graphs. I need to qu开发者_JAVA技巧ery those graphs. I would like to get a feeling for my best choice for the graph modeling task. So far I have these options, but please don't hesitate to suggest others:

  • Proprietary implementation (matrix) and graph traversal algorithms.

  • RDBM and SQL option (too space consuming)

  • RDF and SPARQL option (too slow)

What would you guys suggest? Regards.

EDIT: Just to answer Mad's questions:

  • Each one is relatively small, no more than 200 vertices, 400 edges. However, there are hundreds of them.

  • Frequency of querying: hard to say, it's an experimental system.

  • Speed: not real time, but practical, say 4-5 seconds tops.


You didn't give us enough information to respond with a well thought out answer. For example: what size are these graphs? With what frequencies do you expect to query these graphs? Do you need real-time response to these queries? More information on what your application is for, what is your purpose, will be helpful.

Anyway, to counter the usual responses that suppose SQL-based DBMSes are unable to handle graphs structures effectively, I will give some references:

  • Graph Transformation in Relational Databases (.pdf), by G. Varro, K. Friedl, D. Varro, presented at International Workshop on Graph-Based Tools (GraBaTs) 2004;

    5 Conclusion and Future Work

    In the paper, we proposed a new graph transformation engine based on off-the-shelf relational databases. After sketching the main concepts of our approach, we carried out several test cases to evaluate our prototype implementation by comparing it to the transformation engines of the AGG [5] and PROGRES [18] tools.
    The main conclusion that can be drawn from our experiments is that relational databases provide a promising candidate as an implementation framework for graph transformation engines. We call attention to the fact that our promising experimental results were obtained using a worst-case assessment method i.e. by recalculating the views of the next rule to be applied from scratch which is still highly inefficient, especially, for model transformations with a large number of independent matches of the same rule. ...

    They used PostgreSQL as DBMS, which is probably not particularly good at this kind of applications. You can try LucidDB and see if it is better, as I suspect.

  • Incremental SQL Queries (more than one paper here, you should concentrate on " Maintaining Transitive Closure of Graphs in SQL "): "

    .. we showed that transitive closure, alternating paths, same generation, and other recursive queries, can be maintained in SQL if some auxiliary relations are allowed. In fact, they can all be maintained using at most auxiliary relations of arity 2. ..

  • Incremental Maintenance of Shortest Distance and Transitive Closure in First Order Logic and SQL.

  • Edit: you give more details so... I think the best way is to experiment a little with both a main-memory dedicated graph library and with a DBMS-based solution, then evaluate carefully pros and cons of both solutions.

    For example: a DBMS need to be installed (if you don't use an "embeddable" DBMS like SQLite), only you know if/where your application needs to be deployed and what your users are. On the other hand, a DBMS gives you immediate benefits, like persistence (I don't know what support graph libraries gives for persisting their graphs), transactions management and countless other. Are these relevant for your application? Again, only you know.


    The first option you mentioned seems best. If your graph won't have many edges (|E|=O(|V|)) then you might earn better complexity of time and space using Dictionary:

    var graph = new Dictionary<Vertex, HashSet<Vertex>>();
    

    An interesting graph library is QuickGraph. Never used it but it seems promising :)


    I wrote and designed quite a few graph algorithms for various programming contests and in production code. And I noticed that every time I need one, I have to develop it from scratch, assembling together concepts from graph theory (BFS, DFS, topological sorting etc).

    Perhaps a lack of experience is a reason, but it seems to me that there's still no reasonable general-purpose query language to solve graph problems. Pick a couple of general-purpose graph libraries and solve your particular task in a programming (not query!) language. That will give you best performance and space consumption, but will also require understanding of graph theory basic concepts and of their limitations.

    And the last one: do not use SQL for graphs.

    0

    上一篇:

    下一篇:

    精彩评论

    暂无评论...
    验证码 换一张
    取 消

    最新问答

    问答排行榜