开发者

Why is bounds checking not implemented in some of the languages?

According to the Wikipedia (http://en.wikipedia.org/wiki/Buffer_overflow)

Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection again开发者_高级运维st accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows.

So, why are 'Bounds Checking' not implemented in some of the languages like C and C++?


Basically, it's because it means every time you change an index, you have to do an if statement.

Let's consider a simple C for loop:

int ary[X] = {...};  // Purposefully leaving size and initializer unknown

for(int ix=0; ix< 23; ix++){
    printf("ary[%d]=%d\n", ix, ary[ix]);
}

if we have bounds checking, the generated code for ary[ix] has to be something like

LOOP:
    INC IX          ; add `1 to ix
    CMP IX, 23      ; while test
    CMP IX, X       ; compare IX and X
    JGE ERROR       ; if IX >= X jump to ERROR
    LD  R1, IX      ; put the value of IX into register 1
    LD  R2, ARY+IX  ; put the array value in R2
    LA  R3, Str42   ; STR42 is the format string
    JSR PRINTF      ; now we call the printf routine
    J   LOOP        ; go back to the top of the loop

;;; somewhere else in the code
ERROR:
    HCF             ; halt and catch fire

If we don't have that bounds check, then we can write instead:

    LD R1, IX
LOOP:
    CMP IX, 23
    JGE END
    LD R2, ARY+R1
    JSR PRINTF
    INC R1
    J   LOOP

This saves 3-4 instructions in the loop, which (especially in the old days) meant a lot.

In fact, in the PDP-11 machines, it was even better, because there was something called "auto-increment addressing". On a PDP, all of the register stuff etc turned into something like

CZ  -(IX), END    ; compare IX to zero, then decrement; jump to END if zero

(And anyone who happens to remember the PDP better than I do, don't give me trouble about the precise syntax etc; you're an old fart like me, you know how these things slip away.)


It's all about the performance. However, the assertion that C and C++ have no bounds checking is not entirely correct. It is quite common to have "debug" and "optimized" versions of each library, and it is not uncommon to find bounds-checking enabled in the debugging versions of various libraries.

This has the advantage of quickly and painlessly finding out-of-bounds errors when developing the application, while at the same time eliminating the performance hit when running the program for realz.

I should also add that the performance hit is non-negigible, and many languages other than C++ will provide various high-level functions operating on buffers that are implemented directly in C and C++ specifically to avoid the bounds checking. For example, in Java, if you compare the speed of copying one array into another using pure Java vs. using System.arrayCopy (which does bounds checking once, but then straight-up copies the array without bounds-checking each individual element), you will see a decently large difference in the performance of those two operations.


It is easier to implement and faster both to compile and at run-time. It also simplifies the language definition (as quite a few things can be left out if this is skipped).

Currently, when you do:

int *p = (int*)malloc(sizeof(int));
*p = 50;

C (and C++) just says, "Okey dokey! I'll put something in that spot in memory".

If bounds checking were required, C would have to say, "Ok, first let's see if I can put something there? Has it been allocated? Yes? Good. I'll insert now." By skipping the test to see whether there is something which can be written there, you are saving a very costly step. On the other hand, (she wore a glove), we now live in an era where "optimization is for those who cannot afford RAM," so the arguments about the speed are getting much weaker.


The primary reason is the performance overhead of adding bounds checking to C or C++. While this overhead can be reduced substantially with state-of-the-art techniques (to 20-100% overhead, depending upon the application), it is still large enough to make many folks hesitate. I'm not sure whether that reaction is rational -- I sometimes suspect that people focus too much on performance, simply because performance is quantifiable and measurable -- but regardless, it is a fact of life. This fact reduces the incentive for major compilers to put effort into integrating the latest work on bounds checking into their compilers.

A secondary reason involves concerns that bounds checking might break your app. Particularly if you do funky stuff with pointer arithmetic and casting that violate the standard, bounds checking might block something your application is currently doing. Large applications sometimes do amazingly crufty and ugly things. If the compiler breaks the application, then there's no point in pointing blaming the crufty code for the problem; people aren't going to keep using a compiler that breaks their application.

Another major reason is that bounds checking competes with ASLR + DEP. ASLR + DEP are perceived as solving, oh, 80% of the problem or so. That reduces the perceived need for full-fledged bounds checking.


Because it would cripple those general purpose languages for HPC requirements. There are plenty of applications where buffer overflows really do not matter one iota, simply because they do not happen. Such features are much better off in a library (where in fact you can already find examples for C/C++). For domain specific languages it may make sense to bake such features into the language definition and trade the resulting performance hit for increased security.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜