High Performance Computing Terminology: What's a GF/s? [closed]
I'm reading this Dr Dobb's Article on CUDA
In my system, the global memory bandwidth is slightly over 60 GB/s. This is excellent until you consider that this bandwidth must service 128 hardware threads -- each of which can deliver a large number of floating-point operations. Since a 32-bit floating-point value occupies four (4) bytes, global memory bandwidth limited applications on this hardware will only be able to deliver around 15 GF/s -- or only a small percentage of the available performance capability.
Question: GF/s means Giga flops per second??
Giga flops per second would be it!
GF/s or GFLOPS is GigaFlops or 10^9 FLoating Operations Per Second. (GF/s is bit unusual abbreviation of GigaFLOP/S = GigaFLOPS, see e.g. here "Gigaflops (GF/s) = 10^9 flops" or here "gigaflops per second (GF/s)").
And it is clear for me that GF/s is not GFLOPS/s (not an acceleration).
You should remember that floating operation on CPU and on GPU usually counted in different way. For most CPU, 64-bit floating point format operations are counted usually. And for GPU - 32 bit, because GPU have much more performance in 32bit floating point.
What types of operations are counted? Addition, subtraction and multiplication are. Loading and storing data are not counted. But loading and storing data is necessary to get data from/to memory and sometimes it will limit FLOPS achieved in real application (the article you cited says about this case, "memory bandwidth limited application", when CPU/GPU can deliver lot of FLOPS but memory can't read needed data so fast)
How FLOPS are counted for some chip or computer? There are two different metrics, one is for theoretical upper limit of FLOPS for this chip. It is counted by multipliing cores number, frequency of chip and floating point operations per CPU tick (it was 4 for Core2 and is 8 for Sandy Bridge CPUs).
Other metric is something like real-world flops, which are counted by running LINPACK benchmark (solving a huge linear system of equations). This benchmark uses matrix-matrix multiplication a lot and is kind of approximation of real-world flops. Top500 of supercomupters are measured by parallel version of LINPACK banchmark, the HPL. For single CPU, linpack can have up to 90-95% of theoretical flops, and for huge clusters it is in 50-85% range.
GF in this case is GigaFLOPS, but FLOPS is "floating point operations per second". I'm fairly certain that the author does not mean F/s to be "floating point operations per second per second", so GF/s is actually an error. (Unless you are talking about a computer that increases performance at runtime, I guess) The author probably means GFLOPS.
精彩评论