Ask for suggestion on a data structure usage
For each element in data1
, I need to figure out what elements in data2
are related to it. Also, for each element in data2
, I need to figure out what elements in data1 are related to it. Therefore, I setup a mutual data structure as you can see below:
01 class data1 {
02 // class variables
03 int id;
04 float d1_v1;
05 float d1_v2;
06 map<string, float> m_to_data2; // string is the keyword of data2, float recorded the reference info when data1 and data2 are related.
07 };
08
09 class data2 {
10 // class variables
11 int d2_v1;
float d2_v2;
12 list <int> src_id; // int is the id of data1
13 };
14
15 map<string, data2 *> map_2;
16 map<int, data1 *> map_1;
Then I parse file and fill the map_1 and map_2. I found:
the total memory usage after setting up the mutual-linked two maps: 498.7M.
without set the link from data2 to data1 (not fill
list <int> src_id
), memory usage: 392.7M.Without fill map_1, without fill
list <int> src_id
in data2, memory usage: 182.0MWithout fill map_1, fill
list <int> src_id
with ids of data1, memory usage: 289.7Mwithout fill map m_to_data2, memory usage: 290.0M
The size of map_1: 77737
The size of map_2: 1830009
The size of
map<string, float>
m_to_data2 for each element of map_1 in the range of 3 - 17522The size of
list <int> s开发者_JS百科rc_id
for each element of map_2 in the range of 1- 1377
I need to reduce the memory usage after setting up the mutual-linked maps (ideally less than 200M, currently 498M as you can see above). I was trying to token the string (keyword of data2) to int by setting up an extra map <string, int>
, since int needs less memory than string, but it may not help much since I need extra memory for the map <string, int>
. Any suggestions?
Your comments/suggestions are highly appreciated.
I would start by the following:
- If possible (i.e. you have boost or tr1 available and ordering isn't important), change all
map
s tounordered_map
s. - If possible, change
list
indata2
tovector
. - Have an
unordered_map
that maps fromstring
to anunsigned
id
and use thatid
everywhere you're currently using strings. Given the size range of 3-17k form_to_data2
, you're duplicating names a lot.
Looks like the float reference variable is taking up too much memory may be you can change it to a unsigned int.
If I understand your question correctly I would have rather used std::multimap instead of std::map
std::multimap<std::string, int> map2;
std::multimap<int, std::map<std::string, float> > map1;
that would make your code much easier to understand and proceed towards optimization.
精彩评论