Writing a super UDP server using as most CPU cores as possible
I found serious limitations with the code i am writing.
What I am trying to do is to let my code work on a smp xeon machine with 24 hardware threads as most efficient as it could.
For such a task I am using commoncpp
wrappers around native posix threads and sockets plus the libev library to detect read events on socket file descriptors.开发者_高级运维
The goal I want to obtain is to have no data loss on UDP socket connections which should take around 600mbit/sec of data each.
I found that by establishing more than two connections I got data being lost.
I discovered also that the five threads (one per connection) are not well balanced/distributed on the cpu cores...with this I want to say that only two cores are being working while the rest 22 are left apart unused.
For sure (I can not hide it) I am a dummy smp developer which really needs some help in trying to establish "hardware threads".
I will be so glad to understand whether there is some kind of a posix capability/feature to force hardware threads or some howto guide (for dummies like me :) ) which explain how to use the cpu cores for dedicated needs.
As you may have understood I would like to have one dedicated cpu core per connection.
Thank you all!
I can recommend easy to implement approach that should provide quite good performance. Use Boost.Asio with Boost.Thread. Boost.Asio provides asynchronous networking and can be used in multithreaded environment with little additional effort (good example of tamed multithreading). Investigate these links:
- async UDP echo server: should give you an idea how to use UDP asynchronously
- An HTTP server using an io_service-per-CPU design or An HTTP server using a single io_service and a thread pool calling io_service::run(): should give you ideas how to use Asio in multithreaded environment. It's hard to say which approach is better, I prefer "io_service and a thread pool".
The first time Asio can scare somebody. But then you become addicted to it.
Once I heard that Asio internal dispatcher performance is not optimal. I cannot comment this. Up to now, after using it in many projects with tough performance requirements I was satisfied by its performance.
To do this sort of high speed networking, you might need to dig into the hardware and OS settings.
Check if the network card has support for multiple input queues and if it can use MSI instead of regular interrupts. See if you can set one input queue per CPU core. See if there are some options for how to split up incoming packets to each queue.
Check the OS input buffer sizes. You may need to make them a lot bigger to avoid dropping UDP.
精彩评论