Why we use flush parameter with Encoder.GetBytes method

2023-01-19 03:02 问答作者：

This lin开发者_JAVA技巧k explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is :

true if this encoder can flush its state at the end of the conversion; otherwise, false. To ensure correct termination of a sequence of blocks of encoded bytes, the last call to GetBytes can specify a value of true for flush.

but I didn't understand what flush does , maybe I am drunk or somthing :). can you explain it in more details please.

Suppose you receive data over a socket connection. You will receive a long text as several byte[] blocks.

It is possible that 1 Unicode character occupies 2+ bytes in a UTF-8 stream and that it is split over 2 byte blocks. Encoding the 2 byte blocks separately (and concatenating the strings) would produce an error.

So you can only specify flush=true on the last block. And of course, if you only have 1 block then that is also the last.

Tip: Use a TextReader and let it handle this problem(s) for you.

Edit

The mirror problem (that was actually asked: GetBytes) is slightly harder to explain.

Using flush=true is the same as using Encoder.Reset() after GetBytes(...). It clears the 'state' of the encoder,

including trailing characters at the end of the previous data block, such as an unmatched high surrogate

The basic idea is the same: when converting from string to blocks of bytes, or vice versa, the blocks are not independent.

Internally the Encoder would be implemented with a buffer - this buffer may need to be flushed (cleared) in order to end the read correctly or prepare the Encoder for the next read.

Here is one explanation of buffer flushing.

The exact usage of the flush parameter is described here:

true to clear the internal state of the encoder after the conversion; otherwise, false.

Flushing will reset the internal state of the encoder instance used to encode the text into bytes. Why does it need internal state, you ask? Well, to quote MSDN:

The flush parameter is useful for flushing a high-surrogate at the end of a stream that does not have a low-surrogate. For example, the Encoder created by UTF8Encoding.GetEncoder uses this parameter to determine whether to write out a dangling high-surrogate at the end of a character block.

If you're using multiple GetBytes(), hence, you would want to flush the internal state at the end to terminate any character sequences that need terminating, but only at the end, since terminating sequences might otherwise be introduced in the middle of words.

Note that this may be a purely theoretical problem these days. And, you'd be better off using higher-level wrappers anyway. If you do, being drunk will not be a problem.

继续阅读：.net base-class-library character-encoding

Why we use flush parameter with Encoder.GetBytes method

Edit

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Edit

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？