开发者

FPGA measure accurate times

We are checking how fast is an a开发者_如何学JAVAlgorithm running at the FPGA vs Normal Quad x86 computer.

Now at the x86 we run the algorithm lots of times, and take a median in order to eliminate OS overhead, also this "cleans" the curve from errors. Thats not the problem.

The measure in the FPGA algorithm is in cycles and then take the cycles to time, with the FSMD is trivial to count cycles anyway...

We think that count cycles is too "pure" measure, and this could be done theoretically and dont need to make a real measure or running the algorithm in the real FPGA.

I want to know is there exist a paper or some idea, to do a real time measure.


If you are trying to establish that your FPGA implementation is competitive or superior, and therefore might be useful in the real world, then I encourage you to compare ** wall clock times ** on the multiprocessor vs. the FPGA implementation. That will also help ensure you do not overlook performance effects beyond the FSM + datapath (such as I/O delays).

I agree that reporting cycle counts only is not representative because the FPGA cycle time can be 10X that of off the shelf commodity microprocessors.

Now for some additional unsolicited advice. I have been to numerous FCCM conferences, and similar, and I have heard many dozens of FPGA implementation vs. CPU implementation performance comparison papers. All too often, a paper compares a custom FPGA implementation that took months, vs. a CPU+software implementation wherein the engineer just took the benchmark source code off the shelf, compiled it, and ran it in one afternoon. I do not find such presentations particularly compelling.

A fair comparison would evaluate a software implementation that uses best practices, best available libraries (e.g. Intel MKL or IPP), that used multithreading across multiple cores, that used vector SIMD (e.g. SSE, AVX, ...) instead of scalar computation, that used tools like profilers to eliminate easily fixed waste and like Vtune to understand and tune the cache+memory hierarchy. Also please be sure to report the actual amount of engineering time spent on the FPGA vs. the software implementations.

More free advice: In these energy focused times where results/joule may trump results/second, consider also reporting the energy efficiency of your implementations.

More free advice: to get most repeatable times on the "quad x86" be sure to quiesce the machine, shut down background processors, daemons, services, etc., disconnect the network.

Happy hacking!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜