Problem between IO heavy operations and network application listening for UDP and SCTP data

2023-01-04 09:04 问答作者：

We have an application that uses two types of socket, a listening UDP socket and an active SCTP socket.

At certain time we have scripts running on the same machine that have high IO activities (such as "dd, tar, ..."), most of the time when these IO heavy applications run we seem to have the following problems:

The UDP socket closes
The SCTP socket is still alive and we can see it in /proc/net/sctp/assocs however no traffic is received anymore from this socket (until we restart the application)

Why are these I/O operations affecting the network based application in such a way?

Is there any kernel configurations to avoid these problems?

I would have expected some packets to be lost on the UDP and some retries on the SCTP socket but not this behavior.

The application is running on a server with 64-bits 4 quad core CPU and RHEL OS

# uname -a
Linux server1 2.6.1开发者_JS百科8-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

When you say the UDP socket closes, what exactly do you mean? You try send and it fails?

For SCTP, can you collect wireshark or pcap traces at the time these I/O operations runs (preferably run wireshark on the peer)? My guess is (an educated guess without looking at the code), when these I/O operations comes into the picture, your process gets starved for CPU time. The other end sends SCTP Heartbeat messages to which it gets no replies. Or if data was flowing, the peer end is not receiving any SACKS as they have not yet been processed by the SCTP stack at your end.

The peer, therefore, aborts the association internally and stops sending you data (since it sees all the paths as down ergo does not send ABORT. In such a case, your SCTP stack will still think Association is alive). Try to confirm what are the values for Heartbeat timeout, RTO timeout,SACK timeout, maximum Path retransmission & max Association retransmission at the peer end. I haven't worked with Kernel SCTP but sysctl should be able to give you those values.

Either ways, collecting pcap traces when you observe this problem would give us much better insight to what is going wrong. I hope it helps.

Here are some things I'd look into:

What is loading on the UDP socket when the scripts are not running? Is it continuous or bursty? Does the socket ever spontaneously close when the scripts are not running? What is happening to the data being read off the socket? How much data generated off of the socket (raw or processed) is being written to disk? Can you monitor CPU, network, and disk IO utilization to see if any of them are saturating? Can the scripts running the IO operations be run at a lower priority or, conversely, can the process running the UDP socket be run at a higher priority?

One thing allot of people don't check for is return values on sends, and they don't check for error conditions like EINTR on recv's. Maybe the heavy IO load is causing some of your send's or recv's to get interrupted and your app is seeing the errors as a hard errors and closing the socket without you realizing that the errors are transient.

I've seen this kind of thing happen and you should definitely check for it by cranking up your log level and seeing if your app is calling close unexpectedly.

继续阅读：c io sctp udp

Problem between IO heavy operations and network application listening for UDP and SCTP data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？