开发者

memory usage of in class - converting double to float didn't reduce memory usage as expected

I am initializing millions of classes that are of the following type

template<class T>
struct node
{
  //some functions
private:
  T m_data_1;
  T m_data_2;
  T m_data_3;

  node* m_parent_1;
  node* m_parent_2;
  node* m_child;
}

The purpose of the template is to enable the user to choose float or double precision, with the idea being that by node<float> will occupy less memory (RAM).

However, when I switch from double to float the memory footprint of my program does not decrease as I expect it to. I have two questions,

  1. Is it possible that the compiler/operating system is reserving more space than required for my floats (or eve开发者_Python百科n storing them as a double). If so, how do I stop this happening - I'm using linux on 64 bit machine with g++.

  2. Is there a tool that lets me determine the amount of memory used by all the different classes? (i.e. some sort of memory profiling) - to make sure that the memory isn't being goobled up somewhere else that I haven't thought of.


If you are compiling for 64-bit, then each pointer will be 64-bits in size. This also means that they may need to be aligned to 64-bits. So if you store 3 floats, it may have to insert 4 bytes of padding. So instead of saving 12 bytes, you only save 8. The padding will still be there whether the pointers are at the beginning of the struct or the end. This is necessary in order to put consecutive structs in arrays to continue to maintain alignment.

Also, your structure is primarily composed of 3 pointers. The 8 bytes you save take you from a 48-byte object to a 40 byte object. That's not exactly a massive decrease. Again, if you're compiling for 64-bit.

If you're compiling for 32-bit, then you're saving 12 bytes from a 36-byte structure, which is better percentage-wise. Potentially more if doubles have to be aligned to 8 bytes.


The other answers are correct about the source of the discrepancy. However, pointers (and other types) on x86/x86-64 are not required to be aligned. It is just that performance is better when they are, which is why GCC keeps them aligned by default.

But GCC provides a "packed" attribute to let you exert control over this:

#include <iostream>

template<class T>
struct node
{
private:
    T m_data_1;
    T m_data_2;
    T m_data_3;

    node* m_parent_1;
    node* m_parent_2;
    node* m_child;
}    ;

template<class T>
struct node2
{
private:
    T m_data_1;
    T m_data_2;
    T m_data_3;

    node2* m_parent_1;
    node2* m_parent_2;
    node2* m_child;
} __attribute__((packed));

int
main(int argc, char *argv[])
{
    std::cout << "sizeof(node<double>) == " << sizeof(node<double>) << std::endl;
    std::cout << "sizeof(node<float>) == " << sizeof(node<float>) << std::endl;
    std::cout << "sizeof(node2<float>) == " << sizeof(node2<float>) << std::endl;
    return 0;
}

On my system (x86-64, g++ 4.5.2), this program outputs:

sizeof(node<double>) == 48
sizeof(node<float>) == 40
sizeof(node2<float>) == 36

Of course, the "attribute" mechanism and the "packed" attribute itself are GCC-specific.


In addtion to the valid points that Nicol makes:

When you call new/malloc, it doesn't necessarily correspond 1 to 1 with a call the the OS to allocate memory. This is because in order to reduce the number of expensive syste, calls, the heap manager may allocate more than is requested, and then "suballocate" chunks of that when you call new/malloc. Also, memory can only be allocated 4kb at a time (typically - this is the minimum page size). Essentially, there may be chunks of memory allocated that are not currently actively used, in order to speed up future allocations.

To answer your questions directly:

1) Yes, the runtime will very likely allocate more memory then you asked for - but this memory is not wasted, it will be used for future news/mallocs, but will still show up in "task manager" or whatever tool you use. No, it will not promote floats to doubles. The more allocations you make, the less likely this edge condition will be the cause of the size difference, and the items in Nicol's will dominate. For a smaller number of allocations, this item is likely to dominate (where "large" and "small" depends entirely on your OS and Kernel).

2) The windows task manager will give you the total memory allocated. Something like WinDbg will actually give you the virtual memory range chunks (usually allocated in a tree) that were allocated by the run-time. For Linux, I expect this data will be available in one of the files in the /proc directory associated with your process.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜