开发者

K&R exercise question to an answer about the original question

I kept trying to wrap my head around a solution to K&R problem 7-8, until I found this solution (with the original problem) right on this site. I am unable to comment on the answer (probably due to its age); the only way I could actually have input in that question is to post an answer to it, which I felt was inappropriate. So i've decided to create this heavily related quest开发者_运维知识库ion based off the chosen "answer" which seemed logical to me, up until I got up to this point (concerning implementing a function as a macro):

"Repeat that as a macro very often and the 'space saving' rapidly becomes a cost as the bit masking has a fixed size."

The only problem is that function calls take time as well. "Jumping" to the function location, setting storage aside for the local variable(s), then actually computing the comparison, all take time.

So how exactly is implementing a macro which tests for the ASCII value of the character slower than the first function which incorporates table-lookup (with this in mind)?

How is it possible that a function call can take less time than comparing two integers, one of which is already in memory, and one of which is a constant? It would seem to me that repeated calls to the function and macro, over time, would still result in the macro being faster.

Is my way of thinking wrong? I'm thinking it has to be since it wasn't brought up in the original question.

I would be glad if someone would shed some light on this.


First of all, note that the cost they mention is size, not speed. Just for example, let's assume that the macro expands to 16 bytes of code. Let's further assume that the function compiles to 32 bytes of code, and calling the function takes 6 bytes of code (of course, none of these is guaranteed, but they're all probably at least in the general ballpark of correct for 32-bit code).

In this case, if you use a function, but only call it from one place, you end up with 38 bytes of code. If you use the macro instead you only get 16 bytes of code, for a savings of 22 bytes. If you use the macro in two places, you get 32 bytes of code, vs., 44 bytes if you'd used a function -- still a savings, but a smaller one. Jumping ahead a bit, let's assume you used it from 10 different places in your code. In this case, the macro would take 160 bytes, but the function would only take 92 bytes.

On a modern processor, I can also see a fairly reasonable argument that a function could be faster as well though. Most modern processors use caching. If you use the function enough that it's usually going to be in the cache when you call it, that can well be faster than using a macro, where each time you use the code, you're (more) likely to have to fetch the code from memory again. The reason is pretty simple: a modern processor runs a lot faster than memory.

Even at best, you can plan on a latency of at least 50 ns to fetch some data from memory (and 75-100 ns is fairly common). Just as an average let's assume 75 ns. A typical modern CPU executes around 1.8 instructions per clock, and at (say) 2.5 GHz, has a clock cycle time of .4 ns. That means in 75 ns, it can (on average) execute something like 75/0.4*1.8 = 337.5 instructions. Calling, executing, and returning from a function like we're talking about here is somewhere on the order of a half dozen instructions -- so in a tight loop, by the time you'd fetched the code for the macro from memory once you could execute the function from the cache somewhere around 56 times.

Of course, if you're executing only that in a tight loop, the macro will be in the cache most of the time too. The advantage for the function happens when you have calls to that function from enough different places in the code that it'll usually be in the cache even on the first iteration of a loop, which usually won't be the case for a macro.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜