How to implement VARIANT in Protobuf
As part of my protobuf protocol I require the ability to send data of a dynamic type, a little bit like VARIANT. Roughly I require the data to be an integer, string, boolean or "other" where "other" (e.g. DateTime
) is serialized as a string. I need to be able to use these as a single field and in lists in a number of different locations in the protocol.
How can this best be implemented while keeping message size minimal and performance optimal?
I'm using protobuf-net开发者_StackOverflow with C#.
EDIT:
I've posted a proposed answer below which uses what I think is the minimum of memory required.EDIT2:
Created a github.com project at http://github.com/pvginkel/ProtoVariant with a complete implementation.Jon's multiple optionals covers the simplest setup, especially if you need cross-platform support. On the .NET side (to ensure you don't serialize unnecessary values), simply return null
from any property that isn't a match, for example:
public object Value { get;set;}
[ProtoMember(1)]
public int? ValueInt32 {
get { return (Value is int) ? (int)Value : (int?)null; }
set { Value = value; }
}
[ProtoMember(2)]
public string ValueString {
get { return (Value is string) ? (string)Value : null; }
set { Value = value; }
}
// etc
You can also do the same using the bool ShouldSerialize*()
pattern if you don't like the nulls.
Wrap that up in a class
and you should be fine to use that at either the field level or list level. You mention optimal performance; the only additional thing I can suggest there is to perhaps consider treating as a "group" rather than "submessage", as this is easier to encode (and just as easy to decode, as long as you expect the data). To do that, use the Grouped
data-format, via [ProtoMember]
, i.e.
[ProtoMember(12, DataFormat = DataFormat.Group)]
public MyVariant Foo {get;set;}
However, the difference here can be minimal - but it avoids some back-tracking in the output stream to fix the lengths. Either way, in terms of overheads a "submessage" will take at least 2 bytes; "at least one" for the field-header (perhaps taking more if the 12
is actually 1234567
) - and "at least one" for the length, which gets bigger for longer messages. A group takes 2 x the field-header, so if you use low field-numbers this will be 2 bytes regardless of the length of the encapsulated data (it could be 5MB of binary).
A separate trick, useful for more complex scenarios but not as interoperable, is generic inheritance, i.e. an abstract base class that has ConcreteType<int>
, ConcreteType<string>
etc listed as subtypes - this, however, takes an extra 2 bytes (typically), so is not as frugal.
Taking another step further away from the core spec, if you genuinely can't tell what types you need to support, and don't need interoperability - there is some support for including (optimized) type information in the data; see the DynamicType
option on ProtoMember
- this takes more space than the other two options.
You could have a message like this:
message Variant {
optional string string_value = 1;
optional int32 int32_value = 2;
optional int64 int64_value = 3;
optional string other_value = 4;
// etc
}
Then write a helper class - and possibly extension methods - to ensure that you only ever set one field in the variant.
You could optionally include a separate enum value to specify which field is set (to make it more like a tagged union) but the ability to check the optional fields just means the data is already there. It depends on whether you want the speed of finding the right field (in which case add the discriminator) or the space efficiency of only including the data itself (in which case don't add the discriminator).
That's a general Protocol Buffer approach. There may be something more protobuf-net specific, of course.
Asking questions always helps me think. I found a way to get the number of bytes used for transfer to a bare minimum.
What I've done here is make use of optional properties. Say I want to send an int32. When the value isn't zero, I can just check a property on the message for whether it has a value. Otherwise, I set a type to INT32_ZERO. This way I can correctly store and reconstruct the value. The example below has this implementation for a number of types.
The .proto file:
message Variant {
optional VariantType type = 1 [default = AUTO];
optional int32 value_int32 = 2;
optional int64 value_int64 = 3;
optional float value_float = 4;
optional double value_double = 5;
optional string value_string = 6;
optional bytes value_bytes = 7;
optional string value_decimal = 8;
optional string value_datetime = 9;
}
enum VariantType {
AUTO = 0;
BOOL_FALSE = 1;
BOOL_TRUE = 2;
INT32_ZERO = 3;
INT64_ZERO = 4;
FLOAT_ZERO = 5;
DOUBLE_ZERO = 6;
NULL = 7;
}
And accompanying partial .cs file:
using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;
namespace ConsoleApplication6
{
partial class Variant
{
public static Variant Create(object value)
{
var result = new Variant();
if (value == null)
result.Type = VariantType.NULL;
else if (value is string)
result.ValueString = (string)value;
else if (value is byte[])
result.ValueBytes = (byte[])value;
else if (value is bool)
result.Type = (bool)value ? VariantType.BOOLTRUE : VariantType.BOOLFALSE;
else if (value is float)
{
if ((float)value == 0f)
result.Type = VariantType.FLOATZERO;
else
result.ValueFloat = (float)value;
}
else if (value is double)
{
if ((double)value == 0d)
result.Type = VariantType.DOUBLEZERO;
else
result.ValueDouble = (double)value;
}
else if (value is decimal)
result.ValueDecimal = ((decimal)value).ToString("r", CultureInfo.InvariantCulture);
else if (value is DateTime)
result.ValueDatetime = ((DateTime)value).ToString("o", CultureInfo.InvariantCulture);
else
throw new ArgumentException(String.Format("Cannot store data type {0} in Variant", value.GetType().FullName), "value");
return result;
}
public object Value
{
get
{
switch (Type)
{
case VariantType.BOOLFALSE:
return false;
case VariantType.BOOLTRUE:
return true;
case VariantType.NULL:
return null;
case VariantType.DOUBLEZERO:
return 0d;
case VariantType.FLOATZERO:
return 0f;
case VariantType.INT32ZERO:
return 0;
case VariantType.INT64ZERO:
return (long)0;
default:
if (ValueInt32 != 0)
return ValueInt32;
if (ValueInt64 != 0)
return ValueInt64;
if (ValueFloat != 0f)
return ValueFloat;
if (ValueDouble != 0d)
return ValueDouble;
if (ValueString != null)
return ValueString;
if (ValueBytes != null)
return ValueBytes;
if (ValueDecimal != null)
return Decimal.Parse(ValueDecimal, CultureInfo.InvariantCulture);
if (ValueDatetime != null)
return DateTime.Parse(ValueDatetime, CultureInfo.InvariantCulture);
return null;
}
}
}
}
}
EDIT:
Further comments from @Marc Gravell have improved the implementation significantly. See the Git repository for a complete implementation of this concept.
Actually protobuf doesn't support any kind of VARIANT
types.
You can try to play around using Unions, see more details here
The main idea is to define message wrapper with all existing message types as optional field, and by using union
just specify which type of this concrete message it is.
See example by following the link above.
I use ProtoInclude with an abstract base type and subclasses to get the type and single value statically set. Here's the start of what that could look like for Variant:
[ProtoContract]
[ProtoInclude(1, typeof(Integer))]
[ProtoInclude(2, typeof(String))]
public abstract class Variant
{
[ProtoContract]
public sealed class Integer
{
[ProtoMember(1)]
public int Value;
}
[ProtoContract]
public sealed class String
{
[ProtoMember(1)]
public string Value;
}
}
Usage:
var foo = new Variant.String { Value = "Bar" };
var baz = new Variant.Integer { Value = 10 };
This answer gives takes a bit more space as it encodes the length of the ProtoInclude'd class instance (e.g. 1 byte for int and under < 125 byte strings). I am willing to live with this for the benefit of controlling the type statically.
精彩评论