push_back of STL list got bad performance?
I wrote a simple program to test STL list performance against a simple C list-like data structure. It shows bad performan开发者_如何学Goce at "push_back()" line. Any comments on it?
$ ./test2
Build the type list : time consumed -> 0.311465
Iterate over all items: time consumed -> 0.00898
Build the simple C List: time consumed -> 0.020275
Iterate over all items: time consumed -> 0.008755
The source code is:
#include <stdexcept>
#include "high_resolution_timer.hpp"
#include <list>
#include <algorithm>
#include <iostream>
#define TESTNUM 1000000
/* The test struct */
struct MyType {
int num;
};
/*
* C++ STL::list Test
*/
typedef struct MyType* mytype_t;
void myfunction(MyType t) {
}
int test_stl_list()
{
std::list<mytype_t> mylist;
util::high_resolution_timer t;
/*
* Build the type list
*/
t.restart();
for(int i = 0; i < TESTNUM; i++) {
mytype_t aItem;
aItem->num = i;
mylist.push_back(aItem);
}
std::cout << " Build the type list : time consumed -> " << t.elapsed() << std::endl;
/*
* Iterate over all item
*/
t.restart();
std::for_each(mylist.begin(), mylist.end(), myfunction);
std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl;
return 0;
}
/*
* a simple C list
*/
struct MyCList;
struct MyCList{
struct MyType m;
struct MyCList* p_next;
};
int test_simple_c_list()
{
struct MyCList* p_list_head = NULL;
util::high_resolution_timer t;
/*
* Build it
*/
t.restart();
struct MyCList* p_new_item = NULL;
for(int i = 0; i < TESTNUM; i++) {
p_new_item = (struct MyCList*) malloc(sizeof(struct MyCList));
if(p_new_item == NULL) {
printf("ERROR : while malloc\n");
return -1;
}
p_new_item->m.num = i;
p_new_item->p_next = p_list_head;
p_list_head = p_new_item;
}
std::cout << " Build the simple C List: time consumed -> " << t.elapsed() << std::endl;
/*
* Iterate all items
*/
t.restart();
p_new_item = p_list_head;
while(p_new_item->p_next != NULL) {
p_new_item = p_new_item->p_next;
}
std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl;
return 0;
}
int main(int argc, char** argv)
{
if(test_stl_list() != 0) {
printf("ERROR: error at testcase1\n");
return -1;
}
if(test_simple_c_list() != 0) {
printf("ERROR: error at testcase2\n");
return -1;
}
return 0;
}
Oops, Yes. I modified the code, and it show:
$ ./test2
Build the type list : time consumed -> 0.163724
Iterate over all items: time consumed -> 0.005427
Build the simple C List: time consumed -> 0.018797
Iterate over all items: time consumed -> 0.004778
So, my question is, why my "push_back" code got bad performance?
Well one thing is that in C, you have a linked list of objects but in C++, you have a linked list of pointers (so for one thing, you are doing twice as many allocations). To compare apples to apples, your STL code should be:
int test_stl_list()
{
std::list<MyType> mylist;
util::high_resolution_timer t;
/*
* Build the type list
*/
t.restart();
for(int i = 0; i < TESTNUM; i++) {
MyItem aItem;
aItem.num = i;
mylist.push_back(aItem);
}
std::cout << " Build the type list : time consumed -> " << t.elapsed() << std::endl;
return 0;
}
Your STL codes create a memory piece twice for each cell. The following is from STL 4.1.1 on x86_64
void push_back(const value_type& __x)
{
this->_M_insert(end(), __x);
}
// Inserts new element at position given and with value given.
void _M_insert(iterator __position, const value_type& __x)
{
_Node* __tmp = _M_create_node(__x); // Allocate a new space ####
__tmp->hook(__position._M_node);
}
As you can see, also, push_back() function calls several more functions before returning to the caller, and few pointer-value copying occurs everytime one of the functions is called. Might be neligible because all the parameters are passed by const-reference though.
First, it looks like you're doing a push_front
, not a push_back
(in your own implementation, that is).
Second, you should also compare std::slist
for a fair comparison as the std::list
is double-linked.
Third, you need to use right compiler flags for a fair comparison. With gcc you should at least compile with -O2
. Without optimization, STL always sucks because no inlining is done and there is lots of function call overhead.
It would seem your high_resolution_timer
class is measuring more than just the routines you are trying to measure. I would refactor the code such that the only code between t.restart()
and t.elapsed()
is what you are keen on measuring. All other code therein could have unknown performance implications that could skew your results.
精彩评论