Why does Google Wave Operational Transform need annotations?
The operational transform stuff used in Google Wave has a rather curious document format. A document is basically just an xml subset document - characters, start tags and end tags. In addition to that, the document has "annotations", which are meta-data associated with ranges, e.g. start position and end position. The white paper justifies their presence with:
Wave document operations also support annotations. An annotation is some meta-data associated with an item range, i.e., a start position and an end position. This is particularly useful for describing text formatting and spelling suggestions, as it does not unecessarily complicate the underlying structured document format.
I can certainly see how it would be somewhat difficult if an arbitrary range from a document would be selected and for example bolded - XML tag nesting is strict and that would cause a mess of open and close tag insertions.
However, is this really a problem in practise? I mean, does one necessarily have to support such operation, if not making an editor that basically mimics the years old word processing paradigm instead of being a structured editor? Would pure XML operational transform with the document structure as simply HTML5 be that terrible? Is it a performance issue that styles wo开发者_JS百科uld be in the document as tags? Or does the operational transform model somehow produce unsatisfactory results on text formatting if they are represented with tags?
Also, a side question - how good would the pure "insert character, remove character, retain" operational transform model be on plain text representations? For example, editing HTML5 as text - or editing Wikipedia articles?
There are fundamental problems with using a hierarchical markup language with OT. See below for a worked example:
Does operational transformation work on structured documents such as HTML if simply treated as plain text?
This choice makes sense to me as an optimization on several fronts:
- The underlying document remains as human readable and parse-able as possible
- The algorithms to parse the underlying XML remain as simple as possible (useful for compatibility with non-google attempts at parsing the resulting documents, and for maintenance)
- The extra collected garbage, after multiple edits, can lead to large performance hits - due to the sheer number of tags and/or additional passes on the document to attempt to simplify it.
精彩评论