What are the best ways to store Graphs in persistent storage

2023-01-02 00:50 问答作者：

I am wondering what the best ways to store graphs in persistent storage are, for later analysis, search, clustering, etc.

I see开发者_如何学JAVA neo4j being an option, I am curious if there are also other graph databases available. Does anyone have any insights into how larger social networks store their graph based data (or other sites that require the storage of graph like models, e.g. RDF).

What about options like Cassandra, or MySQL?

Graph Databases:

HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
vertexdb: a high performance graph database server that supports automatic garbage collection.

Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

Graph Libraries:

WebGraph is a framework to study the web graph. From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
Dex is a high performance library to manage very large graphs or networks.
This blog post - On Building a Stupidly Fast Graph Database - provides some guidelines on building a graph database - the technique they use is "memory-mapped I/O, disk-based linear-hashing".

Disclaimer: I am speaking form the graph analysis standpoint.

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.

There is no absolutely correct answer here; there is a large variety of options, the choice of which seriously depends on your needs. With large-scale retrievals/traversals (e.g. social networks and similar back-ends) you're quickly going to run into the random I/O bottleneck; I believe storing your graph in RAM is currently the only practical course of action. Less latency-sensitive applications have quite a wide variety of options, including neo4j (open source with a commercial flavor) and Allegrograph (commercial with a limited free edition).

At Delver we ended up implementing our own denormalized data model (essentially an adjacency list to represent the graph) in RAM on top of GigaSpaces (some info can be found in this presentation), with custom map-reduce code for queries and data analysis. If you go this route, Cassandra seems to be a viable open source platform to build on.

You could look at InfiniteGraph, which will be released for beta very soon (http://www.infinitegraph.com/)

If this is for commercial use then you'll see it's targeted towards sites that will have larger graphs. The social networking sites built custom solutions, which worked for them at the time. But they're in-house solutions are more limiting than using something like InfiniteGraph. Products like Cassandra or MySQL weren't designed for this many-to-many problem set. Can you do it? Sure, but it's a lot of hand-written coding, and not scalable. Let us know if you have a real project, we could help you figure out you graph requirements. Thanks, Warren wdavidson@objectivity.com

继续阅读：database graph neo4j persistent storage

What are the best ways to store Graphs in persistent storage

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？