protobuf-csharp-port - streaming records from a file a bit like an axis function in LINQ-to-XML
I have built the standard address book tutorial that comes with protobuf-csharp-port and my code is as follows:
class Program
{
static void Main(string[] args)
{
CreateData();
ShowData();
}
private static void CreateData()
{
AddressBook.Builder abb = new AddressBook.开发者_Go百科Builder();
for (int i = 0; i < 2000000; i++)
{
Person.Builder pb = new Person.Builder();
pb.Id = i;
pb.Email = "mytest@thisisatest.com";
pb.Name = "John" + i;
abb.AddPerson(pb.Build());
}
var ab = abb.Build();
var fs = File.Create("c:\\testaddressbook.bin");
ab.WriteTo(fs);
fs.Close();
fs.Dispose();
}
private static void ShowData()
{
var fs = File.Open("c:\\testaddressbook.bin", FileMode.Open, FileAccess.Read, FileShare.Read);
CodedInputStream cis = CodedInputStream.CreateInstance(fs);
cis.SetSizeLimit(Int32.MaxValue);
AddressBook ab = AddressBook.ParseFrom(cis);
Console.WriteLine("Person count: {0}", ab.PersonCount);
for (int i = 0; i < ab.PersonCount; i++)
Console.WriteLine("Name: " + ab.GetPerson(i).Name);
Console.WriteLine("Person count: {0}", ab.PersonCount);
fs.Close();
}
}
On writing the data it takes up 300 MB of RAM for 2m records. On reading it takes up about 415 MB of RAM.
In the XML world, I would stream the elements using an axis function. Is it possible to stream the records inside the address book model object? Or maybe there's another way to implement this for more efficient memory-use?
thanks
Yes, you can stream both reading and writing.
There's a version supported by the official Java API and also in my C# API, using WriteDelimitedTo
/ParseDelimitedFrom
.
Alternatively, you can use MessageStreamWriter
and MessageStreamIterator
, which I introduced into my API before the delimited API came along.
I can't comment on that implementation, but in protobuf-net streaming is fully possible. If all the objects you want to stream are first-level children of the root object, then you can simply iterate over the outer sequence; using Serializer.DeserializeItems<T>
if they are all the same type, or Serializer.NonGeneric.TryDeaerializeWithLengthPrefix
if there are different types of objects involved.
If the item you want to treat as a stream is in the middle of the tree, you can provide an alternative receiving model; by just implementing IEnumerable and Add() on a fake collection, it can push data through any API you want (event-based, for example - SAX like).
I should also note that you can serialize streaming data in exactly the same ways. It is not required to have a complete object model at any point.
If you want a more complete example, let me know.
精彩评论