开发者

union members may not have constructors, but `std::pair` okay?

union members may not have destructors or constructors. So I can't template the following class Foo on my own MyClass if MyClass has a constructor:

template<class T>
struct Foo {
  T val;
  Foo(T val_) : val(val_) {}
  size_t hash() const {
    union {T f; size_t s;} u = { val };
    return u.s;
  }
};
struct MyClass {
  bool a;
  double b;
  MyClass(bool a_, double b_) : a(a_), b(b_) {}
};

If I do it anyway I get this error:

member 'MyClass Foo<T>::hash() const 
[with T = MyClass]::<anonymous union>::f' with constructor 
not allowed in union

To get around it I created MyClass with an awkward construction function, whi开发者_开发问答ch copies the thing around first:

struct MyClass {
  bool a;
  double b;
};
MyClass createMyClass(bool a, double b) {
  MyClass m;
  m.a = a;
  m.b = b;
  return m;
}

But I'm wondering if there is a better way than using this createMyClass function. A constructor would be more efficient, and as a critical component MyClass and Foo<MyClass> are constructed millions of times in my code.

std::pair

I'm also a bit surprised that it is possible to use std::pair in the union:

Foo<std::pair<bool, double> > f2(std::make_pair(true, 3.12));

To my knowledge, std::pair (see code) has a constructor?


EDIT: My original stance on std::pair was wrong, it shouldn't be allowed in a union. For a class to be a valid member of a union it must have a trivial constructor according to standard 9.5.1. The definition of a trivial constructor is this, from paragraph 12.1.5:

If there is no user-declared constructor for class X, a default constructor is implicitly declared. An implicitly-declared default constructor is an inline public member of its class. A constructor is trivial if it is an implicitly-declared default constructor and if:

  • its class has no virtual functions and no virtual base classes, and
  • all the direct base classes of its class have trivial constructors, and
  • for all the nonstatic data members of its class that are of class type (or array thereof), each such class has a trivial constructor

Paragraph 20.2.2.2 states that the following constructor must be available in a pair:

pair(const T1& x, const T2& y);

as soon as this constructor is supplied no default constructor will be implicitly declared.

The funny thing here is that my compiler (Visual Studio 2008) seems to give std::pair special treatment. If I copy the code from the std::pair implementation and place it in my own namespace foo the unions don't work :)

namespace foo {
    template<class _Ty1, class _Ty2> struct pair {
        typedef _Ty1 first_type;
        typedef _Ty2 second_type;
        pair() : first(_Ty1()), second(_Ty2()) {
        }
    }
}

//This doesn't work in VC2008
union Baz {
    foo::pair<bool, double> a;
    int b;
}
//This works in VC2008
union Buz {
    std::pair<bool, double> a;
    int b;
}

Your solution is a common way of getting around this problem. I usually prepend the class name with a C (short for construct) to partially mimic the ordinary constructor syntax, this would in your case become CMyClass(a, b).

As Steve and Matthieu has pointed out you're not using a very good hash function though. Firstly there's no real guarantee (I think, please correct me if I'm wrong) that f and s in the union will even partially occupy the same memory space, and secondly even if they in practice will probably will share the first min(sizeof(s), sizeof(f)) bytes this means that for MyClass you're only hashing on part of the value. In this case you will hash on the value of the bool a, in this case there's two options:

  1. Your compiler uses int as the internal representation for the bool in which case your hash function will only return two values, one for true and one for false.
  2. Your compiler uses char as the internal representation for the bool. In this case the value will probably be padded to at least sizeof(int), either with zeroes in which case you have the same situation as 1. or with whatever random data is on the stack when MyClass is allocated which means you get random hash values for the same input.

If you need to hash by the entire value of T I would copy the data into a temporary buffer like Steve suggests and then use one of the variable-length hash functions discussed here.


I would replace this:

size_t hash() const {
    union {T f; size_t s;} u = { val };
    return u.s;
}

With this:

size_t hash() const {
    size_t s = 0;
    memcpy(&s, &val, std::min(sizeof(size_t), sizeof(T)));
    return s;
}

Copies the smaller of the two sizes rather than the larger, and if memcpy is an intrinsic on your compiler then you're looking good for optimisation. Most importantly, though, it doesn't matter what constructors T has.

It's not a good hash function, though, if T is a large type. In your example MyClass, you might find that bool and size_t are the same size in your implementation, hence the double doesn't participate in the hash at all so there are only two possible hashed values.

Still, it could be worse. If T has any virtual functions, you'll probably find that all instances hash to the same value: the address of the vtable...


Regarding the use of an std::pair as a union member, I think it should be disallowed. The standard says (§12.1):

A union member shall not be of a class type (or array thereof) that has a non-trivial constructor.

So any class with a user-defined constructor cannot be used in a union, since the default constructor will no longer be implicitly declared. Now in the specification of std::pair (§20.2.2), it is explicitly stated that pair implementations must provide a parameterized constructor to initialize both values. Consequently, either the pair implementation or union implementation you use does not comply to the standard.

N.B. : Testing the code you gave on Comeau gives the following error:

"ComeauTest.c", line 8: error: invalid union member -- class
          "std::pair<bool, double>" has a disallowed member function
      union {T f; size_t s;} u = { val };
               ^
          detected during instantiation of "unsigned int Foo<T>::hash() const
                    [with T=std::pair<bool, double>]" at line 22


I have only one question: why using a union ?

From what I understand, the hash should correspond to the first few bytes of your objects. If you are going to do this, why not:

size_t hash() const {
  return reinterpret_cast<size_t>(val);
}

which should accomplish the same trick (I think) with more efficiency since there is no allocation of an object of size sizeof(T) on the stack.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜