开发者

Thread-ring benchmark

Today I was doing the thread ring exercise from the Programming Erlang book and googled for other solutions to compare. I found that the language shootout has exactly the same problem as a benchmark. I had the impression that this is an area where Erlang should be fast, but turns out that C and C++ are again on top. My suspicion is that the C/C++ programs are not following the rules which say "pass the token from thread to thread". It seems, after reading them, that they both manipulate some shared memory and global variables which is different from the Erlang code but I could be wrong.

My question is: are they doi开发者_如何学Pythonng the same thing or the C/C++ code is conceptually different (and faster) from the Erlang one?

And another question: why is Haskell faster than Erlang when the solutions are very similar?


The C version is using LWP, which I think is a user-space threading library. To what extent this is "unfair" is up for debate: I'd look at things like whether it supports true pre-emptive concurrency in the sense that you can make blocking system calls in a thread without blocking all the other threads (you can do this in Haskell, can you in Erlang?).

Haskell's threads are slightly more lightweight than Erlang's, because as I understand it an Erlang thread comes with a local heap (in the standard implementation) whereas Haskell threads all share the same heap.


Ultimately, message-passing on modern machines is implemented using some form of shared memory to pass the messages (along with either locks or atomic instructions). So all the C and C++ implementations are really doing is inlining the implementation of message-passing straight into their code. A similar benchmark that uses a fast message-passing library in C, also benchmarked against Haskell and Erlang, can be found in this paper: http://www.cs.kent.ac.uk/pubs/2009/2928/index.html (section 5.1)

The speed of the various approaches is really determined by the concurrent run-time systems involved. Haskell has had a lot of good work done in this area, which leaves it ahead of Erlang. Of course, measuring speed on micro-benchmarks is often mis-leading, and leaves out important factors like the readability of the code. A question to bear in mind might be: which of the solutions in the shoot-out would you be happy to maintain?


I don't think I'd call it cheating. The primary, fundamental difference between multiple threads and multiple processes is that multiple threads share a single address space. As such, specifying multiple threads (rather than multiple processes) seems to me like tacit permission to take advantage of the shared address space (at least in the absence of some very specific definition of "passing" that prohibited this).

What it comes down to is this: Erlang doesn't really have threads, as such -- it has processes with asynchronous communications. The processes are (intentionally) isolated from each other to a large degree. On one hand, this makes development considerably easier -- in particular, one process can only affect another via specific, well-defined channels of communication. Under the hood, it uses lots of tricks (almost certainly including shared memory) to optimize its processes and take advantage of what's possible in a specific implementation/situation (such as all the processes running in a single, shared address space). Nonetheless, having to keep all the tricks hidden keeps it from being quite as efficient as something like the C version where the "tricks" are all explicit and completely exposed.

I'd use a real-life analogy to explain the difference. Think of the threads/processes as people. Erlang enforces a professional working relationship where communications are all carefully recorded in memos. The C and C++ versions are more like a husband and wife who might communicate with a single word that doesn't mean anything to anybody else, or even just a single quick glance.

The latter is extremely efficient when it works -- but it's a lot more open to subtle misunderstandings, and if the two have a fight you probably don't want to be in the same room. For the manager, people in purely professional relationships are a lot easier to manage even if their peak efficiency isn't quite a high.


why is Haskell faster than Erlang when the solution are very similar?

Haskell GHC is a compiled, native-code optimized language implementation with very fast threads. It is generally much faster than Erlang/HIPE. Erlang doesn't have a monopoly on lightweight threads :-)


I would answer by another question: how is the Erlang runtime implemented under the hood ?

Chances are it's implemented in C or a similar system language (I doubt they did it all in assembly). Or at the very least, that the concepts they express can be expressed as efficiently in C.

Now, why do you find it so strange that an optimized C version (the shootout certainly does not show average level code) would beat the Erlang version, considering that Erlang adds its own level of complexity / indirection ?

Whatever the type of benchmark, it's always possible for a C implementation to beat the most polished program in another language... built on top of C, simply because you take the C-version it generates then removes the bits that you don't need.

On the other hand:

  • how much time did it take you to write your code ?
  • what's your degree of trust that it does the right thing ? won't deadlock ?
  • which one would you rather maintain ?

Sometimes "faster" just isn't worth the cost.


not following the rules

Given how many really quite different approaches there are to programming concurrency, I did find it very difficult to both be inclusive enough to bring in different language implementations and yet retain some vague comparability.

Now look at the performance of the same programs measured with different run time configuration and note how much that matters -

SMP quad core 
1.0 C GNU gcc #3    4.16    4.17    16,952  607   100% 0% 1% 0%
4.2 Haskell GHC     17.99   17.42   4,020   307   100% 1% 1% 1%
7.5 Erlang HiPE     31.12   31.13   10,504  273   0% 0% 0% 100%

No SMP one core
1.0 C GNU gcc #3    4.16    4.16    16,952  607   1% 0% 0% 100%
1.4 Haskell GHC     5.89    5.89    3,204   307   0% 0% 0% 100%
2.6 Erlang HiPE     10.64   10.63   9,276   273   0% 0% 0% 100%


One thing to note in that benchmark is that there is only one token to pass around. Which means that in practice it's just a single threaded program reading and writing from/to memory.

I would expect the result to come out different on a multiprocessor machine (or make that a cluster) where the threads/processes have to pass M tokens around in some random order.

Hmm. Also give the developers' of the benchmark solutions equal number of hours to finish their solution. Then I would expect Erlang to come out on top (or close to the top at least).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜