How to serialize a graph?
This is an interview question: How to serialize a graph ? 开发者_运维问答I saw this answer but I am not sure if this is enough.
It looks like a very confusing "open question" and the candidates are probably expected to ask more questions about the requirements: what the nodes and edges are, how they are serialized themselves, is this graph weighted, directed, etc., how many nodes/edges are in the graph.What about the infrastructure ? Is it a plain file system or we should/can use a database ?
So, how would you answer this question ?
I think the answer you provided is quite reasonable. IMO, basically you need to know the application background, I will ask at least:
- is it directed or not?
- what are the properties associated with the vertex, edge and graph itself?
- is the graph sparse (If so then we'd better not use adjacency matrix) ?
The simplest way will be storing it as an edge list. However, in different application there are some classical ways to do it. For example if you are doing circuit simulation then the graph is sparse and the resulting graph/matrix can be stored as column-compressed form. If you are solving a (min-cost) max-flow problem then there are already a DIMACS format, such that public solvers can read it and write it. Structured way is also a good choice if you want human readable, XML can provide self-validation (there is already a GraphML as the standard). By the way, the dot format is quite self-contained.
Meh. Whatever you store it in, it's basically:
Output each vertex in the graph. If you don't have all the vertices first, it's a PITA to rebuild the graph when you're reading it back in.
Now you can store edges between vertices. Hopefully your vertices have some form of ID, to uniquely identify them. The version of this I've seen is "store a (graph|tree) in a database". So, read in the nodes, store in a hashtable or similar for O(1) amortized lookup. Then, foreach edge, lookup ID-source and ID-dest, and link.
Voila, you've deserialized it. If it's not a DB, the same idea generally holds - serialize nodes first, then edges.
精彩评论