F# performance in scientific computing
I am curious as to how F# performance compares to C++ performance? I asked a similar question wit开发者_开发百科h regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching.
I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are:
- How well does it do floating-point?
- Does it allow vector instructions
- how friendly is it towards optimizing compilers?
- How big a memory foot print does it have? Does it allow fine-grained control over memory locality?
- does it have capacity for distributed memory processors, for example Cray?
- what features does it have that may be of interest to computational science where heavy number processing is involved?
- Are there actual scientific computing implementations that use it?
Thanks
I am curious as to how F# performance compares to C++ performance?
Varies wildly depending upon the application. If you are making extensive use of sophisticated data structures in a multi-threaded program then F# is likely to be a big win. If most of your time is spent in tight numerical loops mutating arrays then C++ might be 2-3× faster.
Case study: Ray tracer My benchmark here uses a tree for hierarchical culling and numerical ray-sphere intersection code to generate an output image. This benchmark is several years old and the C++ code has been improved upon dozens of times over the years and read by hundreds of thousands of people. Don Syme at Microsoft managed to write an F# implementation that is slightly faster than the fastest C++ code when compiled with MSVC and parallelized using OpenMP.
I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++?
Developing code is much easier and faster with F# than C++, and this applies to optimization as well as maintenance. Consequently, when you start optimizing a program the same amount of effort will yield much larger performance gains if you use F# instead of C++. However, F# is a higher-level language and, consequently, places a lower ceiling on performance. So if you have infinite time to spend optimizing you should, in theory, always be able to produce faster code in C++.
This is exactly the same benefit that C++ had over Fortran and Fortran had over hand-written assembler, of course.
Case study: QR decomposition This is a basic numerical method from linear algebra provided by libraries like LAPACK. The reference LAPACK implementation is 2,077 lines of Fortran. I wrote an F# implementation in under 80 lines of code that achieves the same level of performance. But the reference implementation is not fast: vendor-tuned implementations like Intel's Math Kernel Library (MKL) are often 10x faster. Remarkably, I managed to optimize my F# code well beyond the performance of Intel's implementation running on Intel hardware whilst keeping my code under 150 lines of code and fully generic (it can handle single and double precision, and complex and even symbolic matrices!): for tall thin matrices my F# code is up to 3× faster than the Intel MKL.
Note that the moral of this case study is not that you should expect your F# to be faster than vendor-tuned libraries but, rather, that even experts like Intel's will miss productive high-level optimizations if they use only lower-level languages. I suspect Intel's numerical optimization experts failed to exploit parallelism fully because their tools make it extremely cumbersome whereas F# makes it effortless.
How well does it do floating-point?
Performance is similar to ANSI C but some functionality (e.g. rounding modes) is not available from .NET.
Does it allow vector instructions
No.
how friendly is it towards optimizing compilers?
This question does not make sense: F# is a proprietary .NET language from Microsoft with a single compiler.
How big a memory foot print does it have?
An empty application uses 1.3Mb here.
Does it allow fine-grained control over memory locality?
Better than most memory-safe languages but not as good as C. For example, you can unbox arbitrary data structures in F# by representing them as "structs".
does it have capacity for distributed memory processors, for example Cray?
Depends what you mean by "capacity for". If you can run .NET on that Cray then you could use message passing in F# (just like the next language) but F# is intended primarily for desktop multicore x86 machines.
what features does it have that may be of interest to computational science where heavy number processing is involved?
Memory safety means you do not get segmentation faults and access violations. The support for parallelism in .NET 4 is good. The ability to execute code on-the-fly via the F# interactive session in Visual Studio 2010 is extremely useful for interactive technical computing.
Are there actual scientific computing implementations that use it?
Our commercial products for scientific computing in F# already have hundreds of users.
However, your line of questioning indicates that you think of scientific computing as high-performance computing (e.g. Cray) and not interactive technical computing (e.g. MATLAB, Mathematica). F# is intended for the latter.
In addition to what others said, there is one important point about F# and that's parallelism. The performance of ordinary F# code is determined by CLR, although you may be able to use LAPACK from F# or you may be able to make native calls using C++/CLI as part of your project.
However, well-designed functional programs tend to be much easier to parallelize, which means that you can easily gain performance by using multi-core CPUs, which are definitely available to you if you're doing some scientific computing. Here are a couple of relevant links:
- F# and Task-Parallel library (blog by Jurgen van Gael, who is doing machine-learning stuff)
- Another interesting answer at SO regarding parllelism
- An example of using Parallel LINQ from F#
- Chapter 14 of my book discusses parallelism (source code is available)
Regarding distributed computing, you can use any distributed computing framework that's available for the .NET platform. There is a MPI.NET project, which works well with F#, but you may be also able to use DryadLINQ, which is a MSR project.
- Some articles: F# MPI tools for .NET, Concurrency with MPI.NET
- DryadLINQ project hompepage
- F# does floating point computation as fast as the .NET CLR will allow it. Not much difference from C# or other .NET languages.
- F# does not allow vector instructions by itself, but if your CLR has an API for these, F# should not have problems using it. See for instance Mono.
- As far as I know, there is only one F# compiler for the moment, so maybe the question should be "how good is the F# compiler when it comes to optimisation?". The answer is in any case "potentially as good as the C# compiler, probably a little bit worse at the moment". Note that F# differs from e.g. C# in its support for inlining at compile time, which potentially allows for more efficient code which rely on generics.
- Memory foot prints of F# programs are similar to that of other .NET languages. The amount of control you have over allocation and garbage collection is the same as in other .NET languages.
- I don't know about the support for distributed memory.
- F# has very nice primitives for dealing with flat data structures, e.g. arrays and lists. Look for instance at the content of the Array module: map, map2, mapi, iter, fold, zip... Arrays are popular in scientific computing, I guess due to their inherently good memory locality properties.
- For scientific computation packages using F#, you may want to look at what Jon Harrop is doing.
As with all language/performance comparisons, your mileage depends greatly on how well you can code.
F# is a derivative of OCaml. I was surprised to find out that OCaml is used a lot in the financial world, where number crunching performance is very important. I was further surprised to find out that OCaml is one of the faster languages, with performance on par with the fastest C and C++ compilers.
F# is built on the CLR. In the CLR, code is expressed in a form of bytecode called the Common Intermediate Language. As such, it benefits from the optimizing capabilities of the JIT, and has performance comparable to C# (but not necessarily C++), if the code is written well.
CIL code can be compiled to native code in a separate step prior to runtime by using the Native Image Generator (NGEN). This speeds up all later runs of the software as the CIL-to-native compilation is no longer necessary.
One thing to consider is that functional languages like F# benefit from a more declarative style of programming. In a sense, you are over-specifying the solution in imperative languages such as C++, and this limits the compiler's ability to optimize. A more declarative programming style can theoretically give the compiler additional opportunities for algorithmic optimization.
It depends on what kind of scientific computing you are doing.
If you are doing traditional heavy computing
, e.g. linear algebra, various optimizations, then you should not put your code in .Net framework, at least not suitable in F#. Because this is at the algorithm level, most of the algorithms must be coded in an imperative languages to have good performance in running time and memory usage. Others mentioned parallel, I must say it is probably useless when you doing low level stuff like parallel an SVD implementation. Because when you know how to parallel an SVD, you simply won't use an high level languages, Fortran, C or modified C(e.g. cilk) are your friends.
However, a lot of the scientific computing today is not of this kind, which is some kind of high level applications, e.g. statistical computing and data mining. In these tasks, aside from some linear algebra, or optimization, there are also a lot of data flows, IOs, prepossessing, doing graphics, etc. For these tasks, F# is really powerful, for its succinctness, functional, safety, easy to parallel, etc.
As others have mentioned, .Net well supports Platform Invoke, actually quite a few projects inside MS are use .Net and P/Invoke together to improve the performance at the bottle neck.
I don't think that you'll find a lot of reliable information, unfortunately. F# is still a very new language, so even if it were ideally suited for performance heavy workloads there still wouldn't be that many people with significant experience to report on. Furthermore, performance is very hard to accurately gauge and microbenchmarks are hard to generalize. Even within C++, you can see dramatic differences between compilers - are you wondering whether F# is competitive with any C++ compiler, or with the hypothetical "best possible" C++ executable?
As to specific benchmarks against C++, here are some possibly relevant links: O'Caml vs. F#: QR decomposition; F# vs Unmanaged C++ for parallel numerics. Note that as an author of F#-related material and as the vendor of F# tools, the writer has a vested interest in F#'s success, so take these claims with a grain of salt.
I think it's safe to say that there will be some applications where F# is competitive on execution time and likely some others where it isn't. F# will probably require more memory in most cases. Of course the ultimate performance will also be highly dependent on the skill of the programmer - I think F# will almost certainly be a more productive language to program in for a moderately competent programmer. Furthermore, I think that at the moment, the CLR on Windows performs better than Mono on most OSes for most tasks, which may also affect your decisions. Of course, since F# is probably easier to parallelize than C++, it will also depend on the type of hardware you're planning to run on.
Ultimately, I think that the only way to really answer this question is to write F# and C++ code representative of the type of calculations that you want to perform and compare them.
Here are two examples I can share:
Matrix multiplication: I have a blog post comparing different matrix multiplication implementations.
LBFGS
I have a large scale logistic regression solver using LBFGS optimization, which is coded in C++. The implementation is well tuned. I modified some code to code in C++/CLI, i.e. I compiled the code into .Net. The .Net version is 3 to 5 times slower than the naive compiled one on different datasets. If you code LBFGS in F#, the performance can not be better than C++/CLI or C#, (but would be very close).
I have another post on Why F# is the language for data mining, although not quite related to the performance issue you concern here, it is quite related to scientific computing in F#.
If I say "ask again in 2-3 years" I think that will answer your question completely :-)
First, don't expect F# to be any different than C# perf-wise, unless you are doing some convoluted recursions on purpose and I'd guess you are not since you asked about numerics.
Floating-point wise it is bound to be better than Java since CLR doesn't aim at cross-platform uniformity, meaning that JIT will go to 80-bits whenever it can. On the other side you don't control over that beyond watching the number of variables to make sure there's enough FP registers.
Vector-wise, if you scream loud enough maybe something happens in 2-3 yr since Direct3D is entering .NET as a general API anyway and C# code done in XNA runs on Xbox whihc is as close to the bare metal you can get with CLR. That still means that you'd need do so some intermediary code on your own.
So don't expect CUDA or even ability to just link NVIDIA libs and get going. You'd have much more luck trying that approach with Haskell if for some reason you really, really need a "functional" language since Haskell was designed to be linking-friendly out of pure necessity.
Mono.Simd has been mentioned already and while it should be back-portable to CLR it might be quite some work to actually do it.
There,s quite some code in a social.msdn posting on using SSE3 in .NET, vith C++/CLI and C#, come array blitting, injecting SSE3 code for perf etc.
There was some talk about running CECIL on compiled C# to extract parts into HLSL, compile into shaders and link a glue code to schedule it (CUDA is doing the equivalent anyway) but I don't think that there's anything runnable coming out of that.
A thing that might be worth more to you if you want to try something soon is PhysX.Net on codeplex. Don't expect it to just unpack and do the magic. However, ih has currently active author and the code is both normal C++ and C++/CLI and yopu can probably get some help from the author if you want to go into details and maybe use similar approach for CUDA. For full speed CUDA you'll still need to compile your own kernels and then just interface to .NET so the easier that part goes the happier you are going to be.
There is a CUDA.NET lib which is supposed to be free but the page gives just e-mail address so expect some strings attached, and while the author writes a blog he's not particularly talkative about what's inside the lib.
Oh and if you have the budget yo might give that Psi Lambda a look (KappaCUDAnet is the .NET part). Apparently they are going to jack up the prices in Nov (if it's not a sales trick :-)
Firstly C is significantly faster than C++.. So if you need so much speed you should make the lib etc in c.
With regards to F# most bench marks use Mono which is up to 2 * slower than MS CLR due t partially to its use of the boehm GC ( they have a new GC and LVVM but these are still immature dont support generics etc).
.NEt languages itself are compiled to an IR ( the CIL) which compile to native code as efficiently as C++. There is one problem set that most GC languages suffer in and that is large amounts of mutable writes ( this includes C++ .NET as mentioned above) . And there is a certain scientific problem set that requires this , these when needed should probably use a native library or use the Flyweight pattern to reuse objects from a pool ( which reduces writes) . The reason is there is a write barrier in the .NET CLR where when updating a reference field (including a box) it will set a bit in a table saying this table is modified . If your code consists of lots of such writes it will suffer.
That said a .NET app like C# using lots of static code , structs and ref/out on the structs can produce C like performance but it is very difficult to code like this or maintain the code ( like C) .
Where F# shines however is parralelism over immutable data which goes hand and hand with more read based problems. Its worth noting most benchmarks are much higher in mutable writes than real life applications.
With regard to floating point , you should use an alternative lib ( ie the .Net one) to the oCaml ones due to it being slow. C/C++ allows faster for lower precision which oCaml doesnt by default.
Lastly i woudl argue a high level language like C#, F# and proper profiling will give you betetr pefromance than c and C++ for the same developer time. If you change a bottle neck to a c lib pinvoke call you will also end up with C like performance for critical areas. That said if you have unlimited budget and care more about speed then maintenance than C is the way to go ( not C++) .
Last I knew, most scientific computing was still done in FORTRAN. It's still faster than anything else for linear algebra problems - not Java, not C, not C++, not C#, not F#. LINPACK is nicely optimized.
But the remark about "your mileage may vary" is true of all benchmarks. Blanket statements (except mine) are rarely true.
精彩评论