开发者

C++ socket programming: maximize throughput/bandwidth on localhost (I only get 3 Gbit/s instead of 23GBit/s)

I want to create a C++ server/client that maximizes the throughput over TCP socket communication on my localhost. As a preparation, I used iperf to find out what the maximum bandwidth is on my i7 MacBookPro.

------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (default)
------------------------------------------------------------
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583
[  4]  0.0-120.0 sec   329 GBytes  23.6 Gbits/sec

Without any tweaking, ipref showed me that I can reach at least 23.2 GBit/s. Then I did my own C++ server/client implementation, you can find the full code here: https://gist.github.com/1116635

I that code I basically transfer a 1024bytes int array with each read/write operation. So my send loop on the server looks like this:

   int n;

   int x[256];

   //fill int array
   for (int i=0;i<256;i++)
   {
       x[i]=i;
   }

   for (int i=0;i<(4*1024*1024);i++)
   {
       n = write(sock,x,sizeof(x));
       if (n < 0)开发者_如何学JAVA error("ERROR writing to socket");
   }

My receive loop on the client looks like this:

int x[256]; 

for (int i=0;i<(4*1024*1024);i++)
{
    n = read(sockfd,x,((sizeof(int)*256)));
    if (n < 0) error("ERROR reading from socket");
}

As mention in the headline, running this (compiled with -O3) results in the following execution time which is about 3 GBit/s:

./client 127.0.0.1 1234
Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms

Where do I loose the bandwidth, what am I doing wrong? Again, the full code can be seen here: https://gist.github.com/1116635

Any help is appreciated!


  • Use larger buffers (i.e. make less library/system calls)
  • Use asynchronous APIs
  • Read the documentation (the return value of read/write is not simply an error condition, it also represents the number of bytes read/written)


My previous answer was mistaken. I have tested your programs and here are the results.

  • If I run the original client, I get 0m7.763s
  • If I use a buffer 4 times as large, I get 0m5.209s
  • With a buffer 8 times as the original I get 0m3.780s

I only changed the client. I suspect more performance can be squeezed if you also change the server.

The fact that I got radically different results than you did (0m7.763s vs 9578ms) also suggests this is caused by the number of system calls performed (as we have different processors..). To squeeze even more performance:

  • Use scater-gather I/O (readv and writev)
  • Use zero-copy mechanisms: splice(2), sendfile(2)


You can use strace -f iperf -s localhost to find out what iperf is doing differently. It seems that it's using significantly larger buffers (131072 Byte large with 2.0.5) than you.

Also, iperf uses multiple threads. If you have 4 CPU cores, using two threads on client and server will will result in approximately doubled performance.


If you really want to get max performance use mmap + splice/sendfile, and for localhost communication use unix domain stream sockets (AF_LOCAL).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜