C++ and Java performance

2022-12-08 09:42 问答作者：

this question is just speculative.

I have the following implementation in C++:

using namespace std;

void testvector(int x)
{
  vector<string> v;
  char aux[20];
  int a = x * 2000;
  int z = a + 2000;
  string s("X-");
  for (int i = a; i < z; i++)
  {
    sprintf(aux, "%d", i);
    v.push_back(s + aux);
  }
}

int main()
{
  for (int i = 0; i < 10000; i++)
  {
    if (i % 1000 == 0) cout << i << endl;
    testvector(i);
  }
}

In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementatio开发者_StackOverflow社区n in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).

I know the Java HotSpot performs a lot of optimizations when translating to native, but I think if such performance can be done in Java, it could be implemented in C++ too...

So, what do you think that should be modified in the program above or, I dunno, in the libraries used or in the memory allocator to reach similar performances in this stuff? (writing actual code of these things can be very long, so, discussing about it would be great)...

Thank you.

You have to be careful with performance tests because it's very easy to deceive yourself or not compare like with like.

However, I've seen similar results comparing C# with C++, and there are a number of well-known blog posts about the astonishment of native coders when confronted with this kind of evidence. Basically a good modern generational compacting GC is very much more optimised for lots of small allocations.

In C++'s default allocator, every block is treated the same, and so are averagely expensive to allocate and free. In a generational GC, all blocks are very, very cheap to allocate (nearly as cheap as stack allocation) and if they turn out to be short-lived then they are also very cheap to clean up.

This is why the "fast performance" of C++ compared with more modern languages is - for the most part - mythical. You have to hand tune your C++ program out of all recognition before it can compete with the performance of an equivalent naively written C# or Java program.

All your program does is print the numbers 0..9000 in steps of 1000. The calls to testvector() do nothing and can be eliminated. I suspect that your JVM notices this, and is essentially optimising the whole function away.

You can achieve a similar effect in your C++ version by just commenting out the call to testvector()!

Well, this is a pretty useless test that only measures allocation of small objects. That said, simple changes made me get the running time down from about 15 secs to about 4 secs. New version:

typedef vector<string, boost::pool_allocator<string> > str_vector;    

void testvector(int x, str_vector::iterator it, str_vector::iterator end)
{
    char aux[25] = "X-";
    int a = x * 2000;
    for (; it != end; ++a)
    {
        sprintf(aux+2, "%d", a);
        *it++ = aux;
    }
}

int main(int argc, char** argv)
{
    str_vector v(2000);
    for (int i = 0; i < 10000; i++)
    {
        if (i % 1000 == 0) cout << i << endl;
        testvector(i, v.begin(), v.begin()+2000);
    }
    return 0;
}

real    0m4.089s
user    0m3.686s
sys     0m0.000s

Java version has the times:

real    0m2.923s
user    0m2.490s
sys     0m0.063s

(This is my direct java port of your original program, except it passes the ArrayList as a parameter to cut down on useless allocations).

So, to sum up, small allocations are faster on java, and memory management is a bit more hassle in C++. But we knew that already :)

Hotspot optimises hot spots in code. Typically, anything that gets executed 10000 times it tries to optimise.

For this code, after 5 iterations it will try and optimise the inner loop adding the strings to the vector. The optimisation it will do more than likely will include escape analyi o the variables in the method. A the vector is a local variable and never escapes local context, it is very likely that it will remove all of the code in the method and turn it into a no op. To test this, try returning the results from the method. Even then, be careful to do something meaningful with the result - just getting it's length for example can be optimised as horpsot can see the result is alway the same a s the number of iterations in the loop.

All of this points to the key benefit of a dynamic compiler like hotspot - using runtime analysis you can optimise what is actually being done at runtime and get rid of redundant code. After all, it doesn't matter how efficient your custom C++ memory allocator is - not executing any code is always going to be faster.

In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).

I cannot reproduce that result.

To account for the optimization mentioned by Alex, I’ve modified the codes so that both the Java and the C++ code printed the last result of the v vector at the end of the testvector method.

Now, the C++ code (compiled with -O3) runs about as fast as yours (12 sec). The Java code (straightforward, uses ArrayList instead of Vector although I doubt that this would impact the performance, thanks to escape analysis) takes about twice that time.

I did not do a lot of testing so this result is by no means significant. It just shows how easy it is to get these tests completely wrong, and how little single tests can say about real performance.

Just for the record, the tests were run on the following configuration:

$ uname -ms
Darwin i386
$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode)
$ g++ --version
i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490)

It should help if you use Vector::reserve to reserve space for z elements in v before the loop (however the same thing should also speed up the java equivalent of this code).

To suggest why the performance both C++ and java differ it would essential to see source for both, I can see a number of performance issues in the C++, for some it would be useful to see if you were doing the same in the java (e.g. flushing the output stream via std::endl, do you call System.out.flush() or just append a '\n', if the later then you've just given the java a distinct advantage)?

What are you actually trying to measure here? Putting ints into a vector?

You can start by pre-allocating space into the vector with the know size of the vector:

instead of:

void testvector(int x)
{
  vector<string> v;
  int a = x * 2000;
  int z = a + 2000;
  string s("X-");
  for (int i = a; i < z; i++)
    v.push_back(i);
}

try:

void testvector(int x)
{
  int a = x * 2000;
  int z = a + 2000;
  string s("X-");
  vector<string> v(z);
  for (int i = a; i < z; i++)
    v.push_back(i);
}

In your inner loop, you are pushing ints into a string vector. If you just single-step that at the machine-code level, I'll bet you find that a lot of that time goes into allocating and formatting the strings, and then some time goes into the pushback (not to mention deallocation when you release the vector).

This could easily vary between run-time-library implementations, based on the developer's sense of what people would reasonably want to do.

继续阅读：performance

C++ and Java performance

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？