What is faster: multiple `send`s or using buffering?

2022-12-26 15:32 问答作者：

I'm playing around with sockets in C/Python and I wonder what is the most efficient way to send headers from a Python dictionary to the client socket.

My ideas:

use a send ca开发者_如何转开发ll for every header. Pros: No memory allocation needed. Cons: many send calls -- probably error prone; error management should be rather complicated
use a buffer. Pros: one send call, error checking a lot easier. Cons: Need a buffer :-) malloc/realloc should be rather slow and using a (too) big buffer to avoid realloc calls wastes memory.

Any tips for me? Thanks :-)

Because of the way TCP congestion control works, it's more efficient to send data all at once. TCP maintains a window of how much data it will allow to be "in the air" (sent but not yet acknowledged). TCP measures the acknowledgments coming back to figure out how much data it can have "in the air" without causing congestion (i.e., packet loss). If there isn't enough data coming from the application to fill the window, TCP can't make accurate measurements so it will conservatively shrink the window.

If you only have a few, small headers and your calls to send are in rapid succession, the operating system will typically buffer the data for you and send it all in one packet. In that case, TCP congestion control isn't really an issue. However, each call to send involves a context switch from user mode to kernel mode, which incurs CPU overhead. In other words, you're still better off buffering in your application.

There is (at least) one case where you're better off without buffering: when your buffer is slower than the context switching overhead. If you write a complicated buffer in Python, that might very well be the case. A buffer written in CPython is going to be quite a bit slower than the finely optimized buffer in the kernel. It's quite possible that buffering would cost you more than it buys you.

When in doubt, measure.

One word of caution though: premature optimization is the root of all evil. The difference in efficiency here is pretty small. If you haven't already established that this is a bottleneck for your application, go with whatever makes your life easier. You can always change it later.

Unless you're sending a truly huge amount of data, you're probably better off using one buffer. If you use a geometric progression for growing your buffer size, the number of allocations becomes an amortized constant, and the time to allocate the buffer will generally follow.

A send() call implies a round-trip to the kernel (the part of the OS which deals with the hardware directly). It has a unit cost of about a few hundred clock cycles. This is harmless unless you are trying to call send() millions of times.

Usually, buffering is about calling send() only once in a while, when "enough data" has been gathered. "Enough" does not mean "the whole message" but something like "enough bytes so that the unit cost of the kernel round-trip is dwarfed". As a rule of thumb, an 8-kB buffer (8192 bytes) is traditionally considered as good.

Anyway, for all performance-related questions, nothing beats an actual measure. Try it. Most of the time, there not any actual performance problem worth worrying about.

继续阅读：buffer c python send sockets

What is faster: multiple `send`s or using buffering?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？