How to make program NUMA ready?
My program uses shared memory as a data storage. This data must be av开发者_StackOverflow中文版ailable to any application running, and fetching this data must be fast. But some applications can run on different NUMA nodes, and data access for them is realy expensive. Is data duplication for every NUMA node is the only way for doing this?
There are two primary sources of slowdown that can be attributed to NUMA. The first is the increased latency of remote access which can vary depending on the platform. On the platforms that I work with, there is about a 30% hit in latency.
The other source of performance loss can come from contention over the communication links and controllers between NUMA nodes.
The default allocation scheme for Linux is to allocate the data on the node where it was created. If majority of the data in the application is initialized by a single thread then it'll generate a lot of cross NUMA domain traffic and contention for that one memory node.
If your data is read only, then replication is a good solution.
Otherwise, interleaving the data allocation across all your nodes will distribute the requests across all the nodes and will help relieve congestion.
To interleave the data, you can use set_mempolicy()
from numaif.h
if you are using Linux.
精彩评论