Tips for designing a serialization file format that will permit easy merging

2023-02-06 05:46 问答作者：

Say I'm building a UML modeling tool. There's some hierarchical organization to the data, and model elements need to be able to refer to others. I need some way to save model files to disk. If multiple people might be working on the files simultaneously, the time will come to merge these model files. Also, it would be nice to compare two revisions in source control and see what has changed. This seems like it would be a common problem across many domains

For this to work well using existing difference and merge tools, the file format should be text, separ开发者_高级运维ated onto multiple lines.

What are some existing serialization formats that do a good job (or poor job) addressing such problems? Or, if designing a custom file format, what are some tips / guidelines / pitfalls?

Bonus question: Any additional guidance if I want to eventually split the model up into multiple files, each separately source controlled?

I solved that problem long ago for octave/matlab, now I need something for C#. The task was to merge two octave-structs to one. I found no merge tool and no fitting serializer, so I had to think about something.

The most important concept decision was to split the struct-tree into lines with the complete path and the content of the leave.

The basic Idea was

Serialize the Struct to Lines, where each line represents a basic Variable (Matrix, string, float,...)
An array or matrix of struct will have the index in the path.
concatenate the two resulting text files, sort the lines
detect collisions and do collision-handling (very easy, because the same Properties will be positioned directly unde each other after the line sorting)
do deserialize

Example:

>> s1

s1 =

scalar structure containing the fields:

b =

  2x2 struct array containing the fields:

    bruch

t = Textstring
f =  3.1416
s =

  scalar structure containing the fields:

    a =  3
    b =  4

will be serialized to

root.b(1,1).bruch=txt2base('isfloat|[ [ 0, 4 ] ; [ 1, 0 ] ; ]');
root.b(1,2).bruch=txt2base('isfloat|[ [ 1, 6 ] ; [ 1, 0 ] ; ]');
root.b(2,1).bruch=txt2base('isfloat|[ [ 2, 7 ] ; [ 1, 0 ] ; ]');
root.b(2,2).bruch=txt2base('isfloat|[ [ 7 ] ; [ 1 ] ; ]');
root.f=txt2base('isfloat|[3.1416]');
root.s.a=txt2base('isfloat|[3]');
root.s.b=txt2base('isfloat|[4]');
root.t=txt2base('ischar|Textstring');

The advantage of this method is, that it is very easy to implement and it is human readable. First you have to write the two functions base2txt and txt2base, wich convert basic types to strings and back. Then you just go recursively through the tree and write for each struct property the path to the property (here seperated by ".") and the content to one line.

The big disadvantage is, that at least my implementation of this is very slow.

The answer to the second question: Is there already something like this out there? I dont know... but I searched for a while, so I don't think so.

Some guidelines:

The format should be designed so that when only one thing has changed in a model, there is only one corresponding change in the file. Some counterexamples:

It's no good if the file format uses arbitrary reference IDs that change every time you edit and save the model.
It's no good if array items are stored with their indices listed explicitly, since inserting items into the middle of an array will cause all the following indices to get shuffled down. That will cause those items to show up in a 'diff' unnecessarily.

Regarding references: if IDs are created serially, then two people editing the same revision of the model could end up creating new elements with the same ID. This will become a problem when merging.

继续阅读：collaboration file-format merge serialization

Tips for designing a serialization file format that will permit easy merging

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？