realloc() for NUMA Systems using HWLOC
I have a several custom allocators that provide different means to allocate memory based on different policies. One of them allocates memory on a defined NUMA node. The inte开发者_开发知识库rface to the allocator is straight-forward
template<typename config>
class NumaNodeStrategy
{
public:
static void *allocate(const size_t sz){}
static void *reallocate(void *old, size_t sz, size_t old_sz){}
static void deallocate(void *p, size_t sz){}
};
The allocation itself is handled using the hwloc_alloc_membind_nodeset()
methods with the according parameters set for allocation policies etc. Howver, hwloc only provides methods for allocation and free'ing memory and I was wondering how should I implement reallocate()
.
Two possible solutions:
- Allocate new memory area and
memcpy()
the data - Use
hwloc_set_membind_nodeset()
to set the memory allocation / binding policy for the nodeset and use plainmalloc()
/posix_memalign()
andrealloc()
.
Can anyone help me in getting this right?
Update:
I try to make the question more specific: Is there a possibility to perform a realloc()
using hwloc
without allocating new memory and moving the pages around?
To reply to the edit: There's no realloc in hwloc, and we currently have no plan to add one. If you see preceisely what you want (C prototype of the function), feel free to add a ticket to https://svn.open-mpi.org/trac/hwloc
To reply to ogsx: The memory binding isn't specific, it's virtual memory area specific, and possibly thread-specific. If you realloc, the libc doesn't do anything special. 1) If it can realloc within the same page, you get memory on the same node. Good, but rare, especially for large buffers. 2) If it realloc in a different page (most of the cases for large buffers), it depends if the corresponding page have already been allocated in physical memory by the malloc lib in the past (malloc'ed and freed in virtual memory, but still allocated in physical memory) 2.a) If the virtual page has been allocated, it may have been allocated on another node for various reasons in the past, you're screwed. 2.b) If the new virtual page has not been allocated yet, the default is to allocate on the current node. If you specified a binding with set_area_membind() or mbind() earlier, it'll be allocated on the right node. You may be happy in this case.
In short, it depends on a lot of things. If you don't want to bother with the malloc lib doing complex/hidden internal things, and especially if your buffers are large, doing mmap(MAP_ANONYMOUS) instead of malloc is a simple way to be sure that pages are allocated when you really want them. And you even have mremap to do something similar to realloc.
alloc becomes mmap(length) + set_area_membind realloc becomes mremap + set_area_membind (on the entire mremap'ed buffer)
Never used that but looks interesting.
The hwloc_set_area_membind_nodeset does the trick, doesn't it?
HWLOC_DECLSPEC int
hwloc_set_area_membind_nodeset (hwloc_topology_t topology,
const void *addr, size_t len, hwloc_const_nodeset_t nodeset,
hwloc_membind_policy_t policy, int flags)
Bind the already-allocated memory identified by (addr, len) to the NUMA node(s) in nodeset.
Returns:
- -1 with errno set to ENOSYS if the action is not supported
- -1 with errno set to EXDEV if the binding cannot be enforced
On linux, this call is implemented via mbind
It works only if pages in the area was not touched, so it is just more correct way to move memory region in your second solution. UPDATE there is a MPOL_MF_MOVE* flags to move touched data.
The only syscall to move pages without reallocate-and-copy I know is move_pages
move_pages moves a set of pages in the address space of a executed process to a different NUMA node.
You're wrong. mbind can move pages that have been touched. You just need to add MPOL_MF_MOVE. That's what hwloc_set_area_membind_nodeset()
does if you add the flag HWLOC_MEMBIND_MIGRATE
.
move_pages
is just a different way to do it (more flexible but a bit slower because you can move independant pages to different places). Both mbind with MPOL_MF_MOVE
and move_pages (and migrate_pages) end up using the exact same migrate_pages()
function in mm/migrate.c once they have converted the input into a list of pages.
精彩评论