Aligned memory management?
I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial meth开发者_运维百科od of using
malloc()
, allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)If I use plain old
malloc()
, allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Callingfree()
on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling
malloc()
, copying, and then callingfree()
on the old block. I'd like to do it in place where possible.
If your implementation has a standard data type that needs 16-byte alignment (
long long
for example),malloc
already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 statesThe pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.
You have to pass back the exact same address into
free
as you were given bymalloc
. No exceptions. So yes, you need to keep the original copy.See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your malloc
implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.
Myself, I'd implement a malloc16
layer on top of malloc
that would use the following structure:
some padding for alignment (0-15 bytes)
size of padding (1 byte)
16-byte-aligned area
Then have your malloc16()
function call malloc
to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.
For free16
, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free
.
This is untested but should be a good start:
void *malloc16 (size_t s) {
unsigned char *p;
unsigned char *porig = malloc (s + 0x10); // allocate extra
if (porig == NULL) return NULL; // catch out of memory
p = (porig + 16) & (~0xf); // insert padding
*(p-1) = p - porig; // store padding size
return p;
}
void free16(void *p) {
unsigned char *porig = p; // work out original
porig = porig - *(porig-1); // by subtracting padding
free (porig); // then free that
}
The magic line in the malloc16
is p = (porig + 16) & (~0xf);
which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16
guarantees it is past the actual start of the maloc'ed block).
Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.
I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):
GNU libc malloc() always returns 8-byte aligned memory addresses, so these routines are only needed if you require larger alignment values.
You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().
Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.
You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap
, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc
is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.
The trickiest requirement is obviously the third one, since any malloc()
/ realloc()
based solution is hostage to realloc()
moving the block to a different alignment.
On Linux, you could use anonymous mappings created with mmap()
instead of malloc()
. Addresses returned by mmap()
are by necessity page-aligned, and the mapping can be extended with mremap()
.
Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size );
primitives, where the parameters are:
alignment - specifies the alignment. Must be a valid alignment supported by the implementation. size - number of bytes to allocate. An integral multiple of alignment
Return value
On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().
On failure, returns a null pointer.
Example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p1 = malloc(10*sizeof *p1);
printf("default-aligned addr: %p\n", (void*)p1);
free(p1);
int *p2 = aligned_alloc(1024, 1024*sizeof *p2);
printf("1024-byte aligned addr: %p\n", (void*)p2);
free(p2);
}
Possible output:
default-aligned addr: 0x1e40c20
1024-byte aligned addr: 0x1e41000
Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of
malloc()
anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).For example, 64-bit Linux on x86/64 has a 16-byte
long double
, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program,sizeof(long double)
is 8 and memory allocations are only 8-byte aligned.Yes - you can only
free()
the pointer returned bymalloc()
. Anything else is a recipe for disaster.If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system
realloc()
and adjusts the realigned data when necessary.
Double check the manual page for your malloc()
; there may be options and mechanisms to tweak it so it behaves as you want.
On MacOS X, there is posix_memalign()
and valloc()
(which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc
and the header is <malloc/malloc.h>
.
You might be able to jimmy (in Microsoft VC++ and maybe other compilers):
#pragma pack(16)
such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.
Just a thought.
-- pete
精彩评论