开发者

const member and assignment operator. How to avoid the undefined behavior?

I answered the question about std::vector of objects and const-correctness, and received a comment about undefined behavior. I do not agree and therefore I have a question.

Consider the class with const member:

class A { 
public: 
    const int c; // must not be modified! 
    A(int c) : c(c) {} 
    A(const A& copy) : c(copy.c) { }     
    // No assignment operator
}; 

I want to have an assignment operator but I do not want to use const_cast like in the following code from one of the 开发者_JAVA技巧answers:

A& operator=(const A& assign) 
{ 
    *const_cast<int*> (&c)= assign.c;  // very very bad, IMHO, it is undefined behavior
    return *this; 
} 

My solution is

// Custom-defined assignment operator
A& operator=(const A& right)  
{  
    if (this == &right) return *this;  

    // manually call the destructor of the old left-side object
    // (`this`) in the assignment operation to clean it up
    this->~A(); 
    // use "placement new" syntax to copy-construct a new `A` 
    // object from `right` into left (at address `this`)
    new (this) A(right); 
    return *this;  
}  

Do I have undefined behavior (UB)?

What would be a solution without UB?


Your code causes undefined behavior.

Not just "undefined if A is used as a base class and this, that or the other". Actually undefined, always. return *this is already UB, because this is not guaranteed to refer to the new object.

Specifically, consider 3.8/7:

If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:

...

— the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type,

Now, "after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied" is exactly what you are doing.

Your object is of class type, and it does contain a non-static data member whose type is const-qualified. Therefore, after your assignment operator has run, pointers, references and names referring to the old object are not guaranteed to refer to the new object and to be usable to manipulate it.

As a concrete example of what might go wrong, consider:

A x(1);
B y(2);
std::cout << x.c << "\n";
x = y;
std::cout << x.c << "\n";

Expect this output?

1
2

Wrong! It's plausible you might get that output, but the reason const members are an exception to the rule stated in 3.8/7, is so that the compiler can treat x.c as the const object that it claims to be. In other words, the compiler is allowed to treat this code as if it was:

A x(1);
B y(2);
int tmp = x.c
std::cout << tmp << "\n";
x = y;
std::cout << tmp << "\n";

Because (informally) const objects do not change their values. The potential value of this guarantee when optimizing code involving const objects should be obvious. For there to be any way to modify x.c without invoking UB, this guarantee would have to be removed. So, as long as the standard writers have done their job without errors, there is no way to do what you want.

[*] In fact I have my doubts about using this as the argument to placement new - possibly you should have copied it to a void* first, and used that. But I'm not bothered whether that specifically is UB, since it wouldn't save the function as a whole.


First: When you make a data member const, you're telling the compiler and all the world that this data member never changes. Of course then you cannot assign to it and you certainly must not trick the compiler into accepting code that does so, no matter how clever the trick.
You can either have a const data member or an assignment operator assigning to all data members. You can't have both.

As for your "solution" to the problem:
I suppose that calling the destructor on an object within a member function invoked for that objects would invoke UB right away. Invoking a constructor on uninitialized raw data to create an object from within a member function that's been invoked for an object that resided where now the constructor is invoked on raw data... also very much sounds like UB to me. (Hell, just spelling this out makes my toenails curl.) And, no, I don't have chapter and verse of the standard for that. I hate reading the standard. I think I can't stand its meter.

However, technicalities aside, I admit that you might get away with your "solution" on just about every platform as long as the code stays as simple as in your example. Still, this doesn't make it a good solution. In fact, I'd argue it's not even an acceptable solution, because IME code never stays as simple as that. Over the years it will get extended, changed, mutated, and twisted and then it will silently fail and require a mind-numbing 36hrs shift of debugging in order to find the problem. I don't know about you, but whenever I find a piece of code like this responsible for 36hrs of debugging fun I want to strangle the miserable dumb-wit who did this to me.

Herb Sutter, in his GotW #23, dissects this idea piece by piece and finally concludes that it "is full of pitfalls, it's often wrong, and it makes life a living hell for the authors of derived classes... never use the trick of implementing copy assignment in terms of copy construction by using an explicit destructor followed by placement new, even though this trick crops up every three months on the newsgroups" (emphasize mine).


How can you possibly assign to an A if it has a const member? You're trying to accomplish something that's fundamentally impossible. Your solution has no new behaviour over the original, which is not necessarily UB but yours most definitely is.

The simple fact is, you're changing a const member. You either need to un-const your member, or ditch the assignment operator. There is no solution to your problem- it's a total contradiction.

Edit for more clarity:

Const cast does not always introduce undefined behaviour. You, however, most certainly did. Apart from anything else, it is undefined not to call all destructors- and you didn't even call the right one- before you placed into it unless you knew for certain that T is a POD class. In addition, there's owch-time undefined behaviours involved with various forms of inheritance.

You do invoke undefined behaviour, and you can avoid this by not trying to assign to a const object.


If you definitely want to have an immutable (but assignable) member, then without UB you can lay things out like this:

#include <iostream>

class ConstC
{
    int c;
protected:
    ConstC(int n): c(n) {}
    int get() const { return c; }
};

class A: private ConstC
{
public:
    A(int n): ConstC(n) {}
    friend std::ostream& operator<< (std::ostream& os, const A& a)
    {
        return os << a.get();
    }
};

int main()
{
    A first(10);
    A second(20);
    std::cout << first << ' ' << second << '\n';
    first = second;
    std::cout << first << ' ' << second << '\n';
}


According to the newer C++ standard draft version N4861 it seems to be no longer undefined behaviour (link):

If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if the original object is transparently replaceable (see below) by the new object. An object o1 is transparently replaceable by an object o2 if:

  • the storage that o2 occupies exactly overlays the storage that o1 occupied, and
  • o1 and o2 are of the same type (ignoring the top-level cv-qualifiers), and
  • o1 is not a complete const object, and
  • neither o1 nor o2 is a potentially-overlapping subobject ([intro.object]), and
  • either o1 and o2 are both complete objects, or o1 and o2 are direct subobjects of objects p1 and p2, respectively, and p1 is transparently replaceable by p2.

Here you can find only "o1 is not a complete const object" regarding const, which is true in this case. But of course you have to ensure that all other conditions are not violated, too.


First off, the whole motivation for your (quite ingenious I might say) usage of "placement new" as a means of implementing the assignment operator, operator=(), as instigated by this question (std::vector of objects and const-correctness), is now nullified. As of C++11, that question's code now has no errors. See my answer here.

Secondly, C++11's emplace() functions now do pretty much exactly what your usage of placement new was doing, except that they are all virtually guaranteed by the compilers themselves now to be well-defined behavior, per the C++ standard.

Third, when the accepted answer states:

because this is not guaranteed to refer to the new object

I wonder if this is because the value contained in the this variable might be changed by the placement new copy-construction operation, NOT because anything using that instance of the class might retain a cached value of it, with the old instance data, rather than read a new value of the object instance from memory. If the former, it seems to me you could ensure this is correct inside the assignment operator function by using a temporary copy of the this pointer, like this:

// Custom-defined assignment operator
A& operator=(const A& right)  
{  
    if (this == &right) return *this;  

    // manually call the destructor of the old left-side object
    // (`this`) in the assignment operation to clean it up
    this->~A(); 

    // Now back up `this` in case it gets corrupted inside this function call
    // only during the placement new copy-construction operation which 
    // overwrites this objct:
    void * thisBak = this;

    // use "placement new" syntax to copy-construct a new `A` 
    // object from `right` into left (at address `this`)
    new (this) A(right); 

    // Note: we cannot write to or re-assign `this`. 
    // See here: https://stackoverflow.com/a/18227566/4561887

    // Return using our backup copy of `this` now
    return *thisBak;  
}  

But, if it has to do with an object being cached and not re-read each time it is used, I wonder if volatile would solve this! ie: use volatile const int c; as the class member instead of const int c;.

Fourth, in the rest of my answer I focus on the usage of volatile, as applied to the class members, to see if this might solve the 2nd of these two potential undefined behavior cases:

  1. The potential UB in your own solution:

     // Custom-defined assignment operator
     A& operator=(const A& right)  
     {  
         if (this == &right) return *this;  
    
         // manually call the destructor of the old left-side object
         // (`this`) in the assignment operation to clean it up
         this->~A(); 
         // use "placement new" syntax to copy-construct a new `A` 
         // object from `right` into left (at address `this`)
         new (this) A(right); 
         return *this;  
     }  
    
  2. The potential UB you mention may exist in the other solution.

     // (your words, not mine): "very very bad, IMHO, it is 
     // undefined behavior"
     *const_cast<int*> (&c)= assign.c;
    

Although I think perhaps adding volatile might fix both cases above, my focus in the rest of this answer is on the 2nd case just above.

tldr;

It seems to me this (the 2nd case just above, in particular) becomes valid and well-defined behavior by the standard if you add volatile and make the class member variable volatile const int c; instead of just const int c;. I can't say this is a great idea, but I think casting away const and writing to c then becomes well-defined behavior and perfectly valid. Otherwise, the behavior is undefined only because reads of c may be cached and/or optimized out since it is only const, and not also volatile.

Read below for more details and justification, including a look at some examples and a little assembly.

const member and assignment operator. How to avoid the undefined behavior?

Writing to const members is only undefined behavior...

...because the compiler may optimize out further reads to the variable, since it's const. In other words, even though you've correctly updated the value contained at a given address in memory, the compiler may tell the code to just regurgitate whatever was last in the register holding the value it first read, rather than going back to the memory address and actually checking for a new value each time you read from that variable.

So this:

// class member variable:
const int c;    

// anywhere
*const_cast<int*>(&c) = assign.c;

probably is undefined behavior. It may work in some cases but not others, on some compilers but not others, or in some versions of compilers, but not others. We can't rely on it to have predictable behavior because the language does not specify what should happen each and every time we set a variable as const and then write to and read from it.

This program, for instance (see here: https://godbolt.org/z/EfPPba):

#include <cstdio>
int main() {
  const int i = 5;
  *(int*)(&i) = 8;
  printf("%i\n", i);
  return 0;
}

prints 5 (although we wanted it to print 8) and produces this assembly in main. (Note that I'm no assembly expert). I've marked the printf lines. You can see that even though 8 is written to that location (mov DWORD PTR [rax], 8), the printf lines do NOT read out that new value. They read out the previously-stored 5 because they don't expect it to have changed, even though it did. The behavior is undefined, so the read is omitted in this case.

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     DWORD PTR [rbp-4], 5
lea     rax, [rbp-4]
mov     DWORD PTR [rax], 8

// printf lines
mov     esi, 5
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

mov     eax, 0
leave
ret

Writing to volatile const variables, however, is not undefined behavior...

...because volatile tells the compiler it better read the contents at the actual memory location on every read to that variable, since it might change at any time!

You might think: "Does this even make sense?" (having a volatile const variable. I mean: "what might change a const variable to make us need to mark it volatile!?) The answer is: "well, yes! It does make sense!" On microcontrollers and other low-level memory-mapped embedded devices, some registers, which could change at any moment by the underlying hardware, are read-only. To mark them read-only in C or C++ we make them const, but to ensure the compiler knows it better actually read the memory at their address location every single time we read the variable, rather than relying on optimizations which retain previously-cached values, we also mark them as volatile. So, to mark address 0xF000 as a read-only 8-bit register named REG1, we'd define it like this in a header file somewhere:

// define a read-only 8-bit register
#define REG1 (*(volatile const uint8_t*)(0xF000))

Now, we can read to it at our whim, and each and every time we ask the code to read the variable, it will. This is well-defined behavior. Now, we can do something like this, and this code will NOT get optimized out, because the compiler knows that this register value actually could change at any given time, since it's volatile:

while (REG1 == 0x12)
{
    // busy wait until REG1 gets changed to a new value
}

And, to mark REG2 as an 8-bit read/write register, of course, we'd just remove const. In both cases, however, volatile is required, as the values could change at any given time by the hardware, so the compiler better not make any assumptions about these variables or try to cache their values and rely on cached readings.

// define a read/write 8-bit register
#define REG2 (*(volatile uint8_t*)(0xF001))

Therefore, the following is not undefined behavior! This is very well-defined behavior as far as I can tell:

// class member variable:
volatile const int c;    

// anywhere
*const_cast<int*>(&c) = assign.c;

Even though the variable is const, we can cast away const and write to it, and the compiler will respect that and actually write to it. And, now that the variable is also marked as volatile, the compiler will read it every single time, and respect that too, the same as reading REG1 or REG2 above.

This program, therefore, now that we added volatile (see it here: https://godbolt.org/z/6K8dcG):

#include <cstdio>
int main() {
  volatile const int i = 5;
  *(int*)(&i) = 8;
  printf("%i\n", i);
  return 0;
}

prints 8, which is now correct, and produces this assembly in main. Again, I've marked the printf lines. Notice the new and different lines I've marked too! These are the only changes to the assembly output! Every other line is exactly identical. The new line, marked below, goes out and actually reads the new value of the variable and stores it into register eax. Next, in preparation for printing, instead of moving a hard-coded 5 into register esi, as was done before, it moves the contents of register eax, which is just read, and which now contains an 8, into register esi. Solved! Adding volatile fixed it!

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     DWORD PTR [rbp-4], 5
lea     rax, [rbp-4]
mov     DWORD PTR [rax], 8

// printf lines
mov     eax, DWORD PTR [rbp-4]  // NEW!
mov     esi, eax                // DIFFERENT! Was `mov     esi, 5`
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

mov     eax, 0
leave
ret

Here's a bigger demo (run it online: https://onlinegdb.com/HyU6fyCNv). You can see that we can write to a variable by casting it to a non-const reference OR a non-const pointer.

In all cases (casting to both non-const references or non-const pointers in order to modify the const value), we can use C++-style casts, OR C-style casts.

In the simple example above, I verified that in all four cases (even using a C-style cast to cast to a reference: (int&)(i) = 8;, oddly enough, since C doesn't have references :)) the assembly output was the same.

#include <stdio.h>

int main()
{
    printf("Hello World\n");

    // This does NOT work!
    const int i1 = 5;
    printf("%d\n", i1);
    *const_cast<int*>(&i1) = 6;
    printf("%d\n\n", i1); // output is 5, when we want it to be 6!
    
    // BUT, if you make the `const` variable also `volatile`, then it *does* work! (just like we do
    // for writing to microcontroller registers--making them `volatile` too). The compiler is making
    // assumptions about that memory address when we make it just `const`, but once you make it
    // `volatile const`, those assumptions go away and it has to actually read that memory address
    // each time you ask it for the value of `i`, since `volatile` tells it that the value at that
    // address could change at any time, thereby making this work.

    // Reference casting: WORKS! (since the `const` variable is now `volatile` too)

    volatile const int i2 = 5;
    printf("%d\n", i2);
    const_cast<int&>(i2) = 7;
    // So, the output of this is 7:
    printf("%d\n\n", i2);
    
    // C-style reference cast (oddly enough, since C doesn't have references :))
    
    volatile const int i3 = 5;
    printf("%d\n", i3);
    (int&)(i3) = 8;
    printf("%d\n\n", i3);
    

    // It works just fine with pointer casting too instead of reference casting, ex:
    
    volatile const int i4 = 5;
    printf("%d\n", i4);
    *(const_cast<int*>(&i4)) = 9;
    printf("%d\n\n", i4);

    // or C-style:
    
    volatile const int i5 = 5;
    printf("%d\n", i5);
    *(int*)(&i5) = 10;
    printf("%d\n\n", i5);


    return 0;
}

Sample output:

Hello World
5
5

5
7

5
8

5
9

5
10

Notes:

  1. I've also noticed that the above works when modifying const class members even when they are NOT volatile. See my "std_optional_copy_test" program! Ex: https://onlinegdb.com/HkyNyTt4D. This, however, is probably undefined behavior. To make it well-defined, make the member variable volatile const instead of just const.
  2. The reason you don't have to cast from volatile const int to volatile int (ie: why just to int reference or int pointer) works just fine, is because volatile affects the reading of the variable, NOT the writing of the variable. So, so long as we read the variable through a volatile variable means, which we do, our reads are guaranteed not to be optimized out. That's what gives us the well-defined behavior. The writes always worked--even when the variable wasn't volatile.

Refences:

  1. [my own answer] What uses are there for "placement new"?
  2. x86 Assembly Guide
  3. Change 'this' pointer of an object to point different object
  4. Compiler Explorer outputs, with assembly, from godbolt.org:
    1. Here: https://godbolt.org/z/EfPPba
    2. And here: https://godbolt.org/z/6K8dcG
  5. [my answer] Register-level GPIO access on STM32 microcontrollers: Programing STM32 like STM8(register level GPIO )


In absence of other (non-const) members, this doesn't make any sense at all, regardless of undefined behavior or not.

A& operator=(const A& assign) 
{ 
    *const_cast<int*> (&c)= assign.c;  // very very bad, IMHO, it is UB
    return *this; 
}

AFAIK, this is no undefined behavior happening here because c is not a static const instance, or you couldn't invoke the copy-assignment operator. However, const_cast should ring a bell and tell you something is wrong. const_cast was primarily designed to work around non const-correct APIs, and it doesn't seem to be the case here.

Also, in the following snippet:

A& operator=(const A& right)  
{  
    if (this == &right) return *this;  
    this->~A() 
    new (this) A(right); 
    return *this;  
}

You have two major risks, the 1st of which has already been pointed out.

  1. In presence of both an instance of derived class of A and a virtual destructor, this will lead to only partial reconstruction of the original instance.
  2. If the constructor call in new(this) A(right); throws an exception, your object will be destroyed twice. In this particular case, it won't be a problem, but if you happen to have significant cleanup, you're going to regret it.

Edit: if your class has this const member that is not considered "state" in your object (i.e. it is some sort of ID used for tracking instances and is not part of comparisons in operator== and the like), then the following might make sense:

A& operator=(const A& assign) 
{ 
    // Copy all but `const` member `c`.
    // ...

    return *this;
}


Have a read of this link:

http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=368

In particular...

This trick allegedly prevents code reduplication. However, it has some serious flaws. In order to work, C’s destructor must assign NULLify every pointer that it has deleted because the subsequent copy constructor call might delete the same pointers again when it reassigns a new value to char arrays.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜