开发者

Efficiency between select() and recv with MSG_PEEK. Asynchronous

I would like to know what would be most efficient when checking for incoming data (asynchronously). Let's say I have 500 connections. I have 3 scenarios (that I can think of):

  1. Using select() to check FD_SETSIZE sockets at a time, then iterating over all of them to receiv开发者_开发百科e the data. (Wouldn't this require two calls to recv for each socket returned? MSG_PEEK to allocate a buffer then recv() it again which would be the same as #3)
  2. Using select() to check one socket at a time. (Wouldn't this also be like #3? It requires the two calls to recv.)
  3. Use recv() with MSG_PEEK one socket at a time, allocate a buffer then call recv() again. Wouldn't this be better because we can skip all the calls to select()? Or is the overhead of one recv() call too much?

I've already coded the situations to 1 and 2, but I'm not sure which one to use. Sorry if I'm a bit unclear.

Thanks


FD_SETSIZE is typically 1024, so you can check all of 500 connections at once. Then, you will perform the two recv calls only on those which are ready -- say, for a very busy system, half a dozen of them each time around, for example. With the other approaches you need about 500 more syscalls (the huge amount of "failing" recv or select calls you perform on the many hundreds of sockets which will not be ready at any given time!-).

In addition, with approach 1 you can block until at least one connection is ready (no overhead in that case, which won't be rare in systems that aren't all that busy) -- with the other approaches, you'll need to be "polling", i.e., churning, continuously, burning huge amounds of CPU to no good purpose (or, if you sleep a while after each loop of checks, then you'll have a delay in responding despite the system not being at all busy -- eep!-).

That's why I consider polling to be an anti-pattern: frequently used, but nevertheless destructive. Sometimes you have absolutely no alternative (which basically tells you that you're having to interact with very badly designed systems -- alas, sometimes in this imperfect life you do have to!-), but when any decent alternative does exist, doing polling nevertheless is really a very bad design practice and should be avoided.


you can simply do some efficiency simulation on 3 scenario where:

Scenario A (0/500 incoming data)

  • for solution #1, you only invoke single select()
  • for solution #2, you need 500 select()
  • for solution #3, you need 500 recv()

Scenario B (250/500 incoming data)

  • for solution #1, single select() + (500 recv())
  • for solution #2, 500 select() + (500 recv())
  • for solution #3, 750 recv()

**assume skipping socket with no buffer size @ no incoming data
answer is obvious :)


...most efficient when checking for incoming data (asynchronously). Let's say I have 500 connections. I have 3 scenarios (that I can think of):

Using select() to check FD_SETSIZE sockets at a time, then iterating over all of them to receive the data. (Wouldn't this require two calls to recv for each socket returned? MSG_PEEK to allocate a buffer then recv() it again which would be the same as #3)

I trust you're carefully constructing your fd set with only the descriptors that are currently connected...? You then iterate over the set and only issue recv() for those that have read or exception/error conditions (the latter difference being between BSD and Windows implementations). While it's ok functionally (and arguably elegant conceptually), in most real-world applications you don't need to peek before recv-ing: even if you're unsure of the message size and know you could peek it from a buffer, you should consider whether you can:

  • process the message in chunks (e.g. read whatever's a good unit of work - maybe 8k, process it, then read the next <=8k into the same buffer...)
  • read into a buffer that's big enough for most/all messages, and only dynamically allocate more if you find the message is incomplete

Using select() to check one socket at a time. (Wouldn't this also be like #3? It requires the two calls to recv.)

Not good at all. If you stay single-threaded, you'd need to put a 0 timeout value on select and spin like crazy through the listenig and client descriptors. Very wasteful of CPU time, and will vastly degrade latency.

Use recv() with MSG_PEEK one socket at a time, allocate a buffer then call recv() again. Wouldn't this be better because we can skip all the calls to select()? Or is the overhead of one recv() call too much?

(Ignoring that it's better to try to avoid MSG_PEEK) - how would you know which socket to MSG_PEEK or recv() on? Again, if you're single threaded, then either you'd block on the first peek/recv attempt, or you use non-blocking mode and then spin like crazy through all the descriptors hoping a peek/recv will return something. Wasteful.

So, stick to 1 or move to a multithreaded model. For the latter, the simplest approach to begin with is to have the listening thread loop calling accept, and each time accept yields a new client descriptor it should spawn a new thread to handle the connection. These client-connection handling threads can simply block in recv(). That way, the operating system itself does the monitoring and wake-up of threads in response to events, and you can trust that it will be reasonably efficient. While this model sounds easy, you should be aware that multi-threaded programming has lots of other complications - if you're not already familiar with it you may not want to try to learn that at the same time as socket I/O.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜