开发者

Ask for suggestion on a data structure usage

For each element in data1, I need to figure out what elements in data2 are related to it. Also, for each element in data2, I need to figure out what elements in data1 are related to it. Therefore, I setup a mutual data structure as you can see below:

01  class data1 {
02    // class variables
03    int id;
04    float d1_v1;
05    float d1_v2;
06    map<string, float> m_to_data2; // string is the keyword of data2, float recorded the reference info when data1 and data2 are related.
07  };
08   
09  class data2 {
10    // class variables
11    int d2_v1;
      float d2_v2;
12    list <int> src_id;  // int is the id of data1
13  };
14   
15  map<string, data2 *> map_2;
16  map<int, data1 *> map_1;

Then I parse file and fill the map_1 and map_2. I found:

  1. the total memory usage after setting up the mutual-linked two maps: 498.7M.

  2. without set the link from data2 to data1 (not fill list <int> src_id), memory usage: 392.7M.

  3. Without fill map_1, without fill list <int> src_id in data2, memory usage: 182.0M

  4. Without fill map_1, fill list <int> src_id with ids of data1, memory usage: 289.7M

  5. without fill map m_to_data2, memory usage: 290.0M

  6. The size of map_1: 77737

  7. The size of map_2: 1830009

  8. The size of map<string, float> m_to_data2 for each element of map_1 in the range of 3 - 17522

  9. The size of list <int> s开发者_JS百科rc_id for each element of map_2 in the range of 1- 1377

I need to reduce the memory usage after setting up the mutual-linked maps (ideally less than 200M, currently 498M as you can see above). I was trying to token the string (keyword of data2) to int by setting up an extra map <string, int>, since int needs less memory than string, but it may not help much since I need extra memory for the map <string, int>. Any suggestions?

Your comments/suggestions are highly appreciated.


I would start by the following:

  • If possible (i.e. you have boost or tr1 available and ordering isn't important), change all maps to unordered_maps.
  • If possible, change list in data2 to vector.
  • Have an unordered_map that maps from string to an unsigned id and use that id everywhere you're currently using strings. Given the size range of 3-17k for m_to_data2, you're duplicating names a lot.


Looks like the float reference variable is taking up too much memory may be you can change it to a unsigned int.


If I understand your question correctly I would have rather used std::multimap instead of std::map

std::multimap<std::string, int> map2;
std::multimap<int, std::map<std::string, float> > map1;

that would make your code much easier to understand and proceed towards optimization.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜