How to Model Real-World Relationships in a Graph Database (like Neo4j)?

2023-04-07 09:35 问答作者：

I have a general question about modeling in a graph database that I just can't seem to wrap my head around.

How do you model this type of relationship: "Newton invented Calculus"?

In a simple graph, you could model it like this:

Newton (node) -> invented (relationship) -> Calculus (node)

...so you'd have a bunch of "invented" graph relationships as you added more people and inventions.

The problem is, you start needing to add a bunch of properties to the relationship:

invention_date
influential_concepts
influential_people
books_inventor_wrote

开发者_开发问答

...and you'll want to start creating relationships between those properties and other nodes, such as:

influential_people: relationship to person nodes
books_inventor_wrote: relationship to book nodes

So now it seems like the "real-world relationships" ("invented") should actually be a node in the graph, and the graph should look like this:

Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)

And to complicate things more, other people are also participated in the invention of Calculus, so the graph now becomes something like:

Newton (node) -> 
  (relationship) -> 
    Newton's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)
Leibniz (node) -> 
  (relationship) -> 
    Leibniz's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)

So I ask the question because it seems like you don't want to set properties on the actual graph database "relationship" objects, because you may want to at some point treat them as nodes in the graph.

Is this correct?

I have been studying the Freebase Metaweb Architecture, and they seem to be treating everything as a node. For example, Freebase has the idea of a Mediator/CVT, where you can create a "Performance" node that links an "Actor" node to a "Film" node, like here: http://www.freebase.com/edit/topic/en/the_last_samurai. Not quite sure if this is the same issue though.

What are some guiding principles you use to figure out if the "real-world relationship" should actually be a graph node rather than a graph relationship?

If there are any good books on this topic I would love to know. Thanks!

Some of these things, such as invention_date, can be stored as properties on the edges as in most graph databases edges can have properties in the same way that vertexes can have properties. For example you could do something like this (code follows TinkerPop's Blueprints):

Graph graph = new Neo4jGraph("/tmp/my_graph");
Vertex newton = graph.addVertex(null);
newton.setProperty("given_name", "Isaac");
newton.setProperty("surname", "Newton");
newton.setProperty("birth_year", 1643); // use Gregorian dates...
newton.setProperty("type", "PERSON");

Vertex calculus = graph.addVertex(null);
calculus.setProperty("type", "KNOWLEDGE");

Edge newton_calculus = graph.addEdge(null, newton, calculus, "DISCOVERED");
newton_calculus.setProperty("year", 1666);

Now, lets expand it a little bit and add in Liebniz:

Vertex liebniz = graph.addVertex(null);
liebniz.setProperty("given_name", "Gottfried");
liebniz.setProperty("surnam", "Liebniz");
liebniz.setProperty("birth_year", "1646");
liebniz.setProperty("type", "PERSON");

Edge liebniz_calculus = graph.addEdge(null, liebniz, calculus, "DISCOVERED");
liebniz_calculus.setProperty("year", 1674);

Adding in the books:

Vertex principia = graph.addVertex(null);
principia.setProperty("title", "Philosophiæ Naturalis Principia Mathematica");
principia.setProperty("year_first_published", 1687);
Edge newton_principia = graph.addEdge(null, newton, principia, "AUTHOR");
Edge principia_calculus = graph.addEdge(null, principia, calculus, "SUBJECT");

To find out all of the books that Newton wrote on things he discovered we can construct a graph traversal. We start with Newton, follow the out links from him to things he discovered, then traverse links in reverse to get books on that subject and again go reverse on a link to get the author. If the author is Newton then go back to the book and return the result. This query is written in Gremlin, a Groovy based domain specific language for graph traversals:

newton.out("DISCOVERED").in("SUBJECT").as("book").in("AUTHOR").filter{it == newton}.back("book").title.unique()

Thus, I hope I've shown a little how a clever traversal can be used to avoid issues with creating intermediate nodes to represent edges. In a small database it won't matter much, but in a large database you're going to suffer large performance hits doing that.

Yes, it is sad that you can't associate edges with other edges in a graph, but that's a limitation of the data structures of these databases. Sometimes it makes sense to make everything a node, for example, in Mediator/CVT a performance has a bit more concreteness too it. Individuals may wish address only Tom Cruise's performance in "The Last Samurai" in a review. However, for most graph databases I've found that application of some graph traversals can get me what I want out of the database.

继续阅读：graph-databases neo4j nosql

How to Model Real-World Relationships in a Graph Database (like Neo4j)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？