开发者

Using pointers, references, handles to generic datatypes, as generic and flexible as possible

In my application I have lots o开发者_如何学Cf different data types, e.g. Car, Bicycle, Person, ... (they're actually other data types, but this is just for the example).

Since I also have quite some 'generic' code in my application, and the application was originally written in C, pointers to Car, Bicycle, Person, ... are often passed as void-pointers to these generic modules, together with an identification of the type, like this:

Car myCar;
ShowNiceDialog ((void *)&myCar, DATATYPE_CAR);

The 'ShowNiceDialog' method now uses meta-information (functions that map DATATYPE_CAR to interfaces to get the actual data out of Car) to get information of the car, based on the given data type. That way, the generic logic only has to be written once, and not every time again for every new data type.

Of course, in C++ you could make this much easier by using a common root class, like this

class RootClass
   {
   public:
      string getName() const = 0;
   };

class Car : public RootClass
   {
   ...
   };

void ShowNiceDialog (RootClass *root);

The problem is that in some cases, we don't want to store the data type in a class, but in a totally different format to save memory. In some cases we have hundreds of millions of instances that we need to manage in the application, and we don't want to make a full class for every instance. Suppose we have a data type with 2 characteristics:

  • A quantity (double, 8 bytes)
  • A boolean (1 byte)

Although we only need 9 bytes to store this information, putting it in a class means that we need at least 16 bytes (because of the padding), and with the v-pointer we possibly even need 24 bytes. For hundreds of millions of instances, every byte counts (I have a 64-bit variant of the application and in some cases it needs 6 GB of memory).

The void-pointer approach has the advantage that we can almost encode anything in a void-pointer and decide how to use it if we want information from it (use it as a real pointer, as an index, ...), but at the cost of type-safety.

Templated solutions don't help since the generic logic forms quite a big part of the application, and we don't want to templatize all this. Additionally, the data model can be extended at run time, which also means that templates won't help.

Are there better (and type-safer) ways to handle this than a void-pointer? Any references to frameworks, whitepapers, research material regarding this?


If you don't want a full class, you should read up on FlyWeight pattern. It's designed to save up memory.

EDIT: sorry, lunch-time pause ;)

The typical FlyWeight approach is to separate properties that are common to a great number of objects from properties that are typical of a given instance.

Generally, it means:

struct Light
{
  kind_type mKind;
  specific1 m1;
  specific2 m2;
};

The kind_type is often a pointer, however it is not necessary. In your case it would be a real waste because the pointer itself would be 4 times as big as the "useful" information.

Here I think we could exploit padding to store the id. After all, as you said it's going to be expanded to 16 bits even though we only use 9 of them, so let's not waste the other 7!

struct Object
{
  double quantity;
  bool flag;
  unsigned char const id;
};

Note that the order of elements is important:

0x00    0x01    0x02    0x03
[      ][      ][      ][      ]
   quantity       flag     id

0x00    0x01    0x02    0x03
[      ][      ][      ][      ]
   id     flag     quantity

0x00            0x02            0x04
[      ][      ][      ][      ][      ][      ]
   id     --        quantity      flag     --

I don't understand the "extended at runtime" bit. Seems scary. Is this some sort of self-modifying code ?

Template allow to create a very interesting form of FlyWeight: Boost.Variant.

typedef boost::variant<Car,Dog,Cycle, ...> types_t;

The variant can hold any of the types cited here. It can be manipulated by "normal" functions:

void doSomething(types_t const& t);

Can be stored in containers:

typedef std::vector<types_t> vector_t;

And finally, the way to operate over it:

struct DoSomething: boost::static_visitor<>
{
  void operator()(Dog const& dog) const;

  void operator()(Car const& car) const;
  void operator()(Cycle const& cycle) const;
  void operator()(GenericVehicle const& vehicle) const;

  template <class T>
  void operator()(T const&) {}
};

It's very interesting to note the behavior here. Normal function overload resolution occurs, therefore:

  • If you have a Car or a Cycle you'll use those, every other child of GenericVehicle will us the 4th version
  • It's possible to specify a template version as a catch them all, and specify it appropriately.

I shall note that non-template methods can perfectly be defined in a .cpp file.

In order to apply this visitor, you use the boost::apply_visitor method:

types_t t;
boost::apply_visitor(DoSomething(), t);

// or

boost::apply_visitor(DoSomething())(t);

The second way seems odd, but it means you can use it in a most interesting fashion, as predicate:

vector_t vec = /**/;
std::foreach(vec.begin(), vec.end(), boost::apply_visitor(DoSomething()));

Read up on variant, it's most interesting.

  • Compile time check: you missed one operator() ? the compiler throws up
  • No necessity of RTTI: no virtual pointer, no dynamic type --> as fast as using a union, but with increased safety

You can of course segment your code, by defining multiple variants. If some sections of the code only deal with 4/5 types, then use a specific variant for it :)


In this case, it sounds like you should simply use overloading. For example:

#ifdef __cplusplus // Only enable this awesome thing for C++:
#   define PROVIDE_OVERLOAD(CLASS,TYPE) \
    inline void ShowNiceDialog(const CLASS& obj){ \ 
         ShowNiceDialog(static_cast<void*>(&obj),TYPE); \
    }

    PROVIDE_OVERLOAD(Car,DATATYPE_CAR)
    PROVIDE_OVERLOAD(Bicycle,DATATYPE_BICYCLE)
    // ...

#undef PROVIDE_OVERLOAD // undefine it so that we don't pollute with macros
#endif // end C++ only 

If you create overloads for your various types, then you will be able to invoke ShowNiceDialog in a simple and type safe manner, but you will still be able to leverage your optimized C variant of it.

With the code above, you could, in C++, write something like the following:

 Car c;
 // ...
 ShowNiceDialog(c);

If you changed the type of c, then it would still use the appropriate overload (or give an error if there was no overload). It doesn't prevent one from using the existing type-unsafe C variant, but since the typesafe version is easier to invoke, I would expect that other developers would prefer it, anyway.

Edit
I should add that the above answers the question of how to make the API typesafe, not about how to make the implementation typesafe. This will help those using your system to avoid unsafe invocations. Also note that these wrappers provide a typesafe means for using types known already at compile-time... for dynamic types, it really would be necessary to use the unsafe versions. However, another possibility is that you could provide a wrapper class like the following:

class DynamicObject
{
    public:
         DynamicObject(void* data, int id) : _datatype_id(id), _datatype_data(data) {}
         // ...
         void showNiceDialog()const{ ShowNiceDialog(_datatype_data,_datatype_id); }
         // ...
    private:
         int _datatype_id;
         void* _datatype_data;
};

For those dynamic types, you would still not have much safety when it comes to constructing the object, but once the object were constructed, you would have a much safer mechanism. It would be reasonable to combine this with a typesafe factory so that users of your API would never actually construct the DynamicObject class themselves, and so would not need to invoke the unsafe constructor.


It's perfectly possible to change the packing of a class in, say, Visual Studio- you can use __declspec(align(x)) or #pragma pack(x) and there's an option in the property pages.

I would suggest that the solution is to store your classes in, say, vectors of each data member individually, then each class will hold just a reference to the master class and an index into these vectors. If the master class were to be a singleton, then this could be improved further.

class VehicleBase {
public:
    virtual std::string GetCarOwnerFirstName() = 0;
    virtual ~VehicleBase();
};
class Car : public VehicleBase {
    int index;
public:
    std::string GetCarOwnerFirstName() { return GetSingleton().carownerfirstnames[index]; }
};

Of course, this leaves some implementation details to be desired, such as the memory management of Car's data members. However, Car itself is trivial and can be created/destroyed at any time, and the vectors in GetSingleton will pack data members quite efficiently.


I would use traits

template <class T>
struct DataTypeTraits
{
};

template <>
struct DataTypeTraits<Car>
{
   // put things that describe Car here
   // Example: Give the type a name
   static std::string getTypeName()
   {
      return "Car";
   }
};
template <>
struct DataTypeTraits<Bicycle>
{
   // the same for bicycles
   static std::string getTypeName()
   {
      return "Bicycle";
   }
};

template <class T>
ShowNiceDialog(const T& t)
{
   // Extract details of given object
   std::string typeName(DataTypeTraits<T>::getTypeName());
   // more stuff
}

This way you don't need to change ShowNiceDialog() whenever you add a new type you want to apply it to. All you need is a specialization of DataTypeTraits for the new type.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜