Parallel Binary DeSerialization?

2022-12-14 22:12 问答作者：

I have a solution where I need to read objects into memory very quickly, however the binary stream might be cached compressed in memory to save time on disk io.

I've tinkered around with different solutions, obviously XmlTextWriter and XmlTextReader wasnt so good and neither was the built-in binary serialization. Protobuf-net is excellent but still a little bit too slow. Here are some stats:

File Size XML: 217 kb

File Size Binary: 开发者_如何学Go87 kb

Compressed Binary: 26 KB

Compressed XML: 26 KB

Deserialize with XML (XmlTextReader) : 8.4 sek

Deserialize with Binary (Protobuf-net): 6.2 sek

Deserialize with Binary wo string.interning (Protobuf-net): 5.2 sek

Deserialize with Binary From memory: 5.9 Sek

Time to decompress binary file into memory: 1.8 sek

Serialize With Xml (XmlTextWriter) : 11 sek

Serialize With Binary (Protobuf): 4 sek

Serialize With Binary length prefix (Protobuf-net): 3.8 sek

That got me thinking, it seems (correct me if I'm wrong) that the major culprit of deserialization is the actual byte conversion rather than the IO. If thats' the case then it should be a candidate for using the new Parallel extensions.

Since I'm bit of a novice when it comes to binary IO I'd appreciate some input before I commit time to solution though :)

For simplicity sake, say we want to deserialize a list of objects with no optional field. My first idea was simply to store each with a length prefix. Read the byte[] of each into a list of byte[] and use PLINQ to do the byte[] -> object deserialization.

However with that method I still need to read the byte[] singlethreadedly, so perhaps one could read the whole binary stream into memory instead (how large binary files are feasible for that btw?) and in the beginning of the binary file instead store how many objects there are and each of their length and offset. Then I should be able to just create ArraySegments or something and do the chunking in paralllel too.

So what do you guys think , is it feasible?

I do things like this quite a lot, and nothing really beats using BinaryReader to read things in. As far as I know, there is no faster way than using BinaryReader.ReadInt32 to read in a 32 bit integer.

You may also find that the overhead of making it parallel and joining back together to be too much. If you really want to go the parallel route, I would advise using multiple threads to read in multiple files, rather than multiple threads to read one file in multiple blocks.

You could also play around with the block size to make it match disk block size, but there are so many levels of abstraction in between your application and the disk that could make that a waste of time.

Binary file can be read simultaneously by several threads. To do that it must be opened with appropriate access/share modifiers. And then each thread can get its own offset and length in that file. Thus reading in parallel is not a problem.

Let us assume that you will stick to simple binary format: each object is prefixed with its length. Knowing that you can "scroll" the file and know the offset where to put the deserializing thread.

Deserializing algoritm can look like this: 1) analyze file (divide it into several relatively large chunks, chunk border should coinside with object border) 2) spawn necessary amount of deserializer threads and "instruct" them with appropriate offset and length to read 3) combine results of all deserializer threads into one list

That got me thinking, it seems (correct me if I'm wrong) that the major culprit of deserialization is the actual byte conversion rather than the IO.

Don't assume where the time is being spent, get yourself a profiler and find out.

Whene i Deserialize list of object larger then 1 MB xml i Deserialize les then 2 seconds with this code:

public static List<T> FromXML<T>(this string s) where T : class
        {
            var ls = new List<T>();
            var xml = new XmlSerializer(typeof(List<T>));
            var sr = new StringReader(s);
            var xmltxt = new XmlTextReader(sr);
            if (xml.CanDeserialize(xmltxt))
            {
                ls = (List<T>)xml.Deserialize(xmltxt);
            }
            return ls;
        }

Try this if is beter for XML case?

继续阅读：binary-data serialization

Parallel Binary DeSerialization?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？