开发者

C++ handling specific impl - #ifdef vs private inheritance vs tag dispatch

I have some classes implementing some computations which I have to optimize for different SIMD implementations e.g. Altivec and SSE. I don't want to polute the code with #ifdef ... #endif blocks for each method I have to optimize so I tried a couple of other approaches, but unfotunately I'm not very satisfied of how it turned out for reasons I'll try to clarify. So I'm looking for some advice on how I could improve what I have already done.

1.Different implementation files with crude includes

I have the same header file describing the class interface with different "pseudo" implementation files for plain C++, Altivec and SSE only for the relevant methods:

// Algo.h
#ifndef ALGO_H_INCLUDED_
#define ALGO_H_INCLUDED_
class Algo
{
public:
    Algo();
    ~Algo();

    void process();
protected:
    void computeSome();
    void computeMore();
};
#endif

// Algo.cpp
#include "Algo.h"
Algo::Algo() { }

Algo::~Algo() { }

void Algo::process()
{
    computeSome();
    computeMore();
}

#if defined(ALTIVEC)
#include "Algo_Altivec.cpp" 
#elif defined(SSE)
#include "Algo_SSE.cpp"
#else
#include "Algo_Scalar.cpp"
#endif

// Algo_Altivec.cpp
void Algo::computeSome()
{
}
void Algo::computeMore()
{
}
... same for the other implementation files

Pros:

  • the split is quite straightforward and easy to do
  • there is no "overhead"(don't know how to say it better) to objects of my class by which I mean no extra inheritance, no addition of member variables etc.
  • much cleaner than #ifdef-ing all over the place

Cons:

  • I have three additional files for maintenance; I could put the Scalar implementation in the Algo.cpp file though and end up with just two but the inclusion part will look and fell a bit dirtier
  • they are not compilable units per-se and have to be excluded from the project structure
  • if I do not have the specific optimized implementation yet for let's say SSE I would have to duplicate some code from the plain(Scalar) C++ implementation file
  • I cannot fallback to the plain C++ implementation if nedded; ? is it even possible to do that in the described scenario ?
  • I do not feel any structural cohesion in the approach

2.Different implementation files with private inheritance

// Algo.h
class Algo : private AlgoImpl
{
 ... as before
}

// AlgoImpl.h
#ifndef ALGOIMPL_H_INCLUDED_
#define ALGOIMPL_H_INCLUDED_
class AlgoImpl
{
protected:
    AlgoImpl();
    ~AlgoImpl();

   void computeSomeImpl();
   void computeMoreImpl();
};
#endif

// Algo.cpp
...
void Algo::computeSome()
{
    computeSomeImpl();
}
void Algo::computeMore()
{
    computeMoreImpl();
}

// Algo_SSE.cpp
AlgoImpl::AlgoImpl()
{
}
AlgoImpl::~AlgoImpl()
{
}
void AlgoImpl::computeSomeImpl()
{
}
void AlgoImpl::computeMoreImpl()
{
}

Pros:

  • the split is quite straightforward and easy to do
  • much cleaner than #ifdef-ing all over the place
  • still there is no "overhead" to my class - EBCO should kick in
  • the semantic of the class is much more cleaner at least comparing to the above that is private inheritance == is implemented in terms of
  • the different files are compilable, can be included in the project and selected via the build system

Cons:

  • I have three additional files for maintenance
  • if I do not have the specific optimized implementation yet for let's say SSE I would have to duplicate some code from the plain(Scalar) C++ implementation file
  • I cannot fallback to the plain C++ implementation if nedded

3.Is basically method 2 but with virtual functions in the Alg开发者_StackOverflowoImpl class. That would allow me to overcome the duplicate implementation of plain C++ code if needed by providing an empty implementation in the base class and override in the derived although I will have to disable that behavior when I actually implement the optimized version. Also the virtual functions will bring some "overhead" to objects of my class.

4.A form of tag dispatching via enable_if<>

Pros:

  • the split is quite straightforward and easy to do
  • much cleaner than #ifdef ing all over the place
  • still there is no "overhead" to my class
  • will eliminate the need for different files for different implementations

Cons:

  • templates will be a bit more "cryptic" and seem to bring an unnecessary overhead(at least for some people in some contexts)
  • if I do not have the specific optimized implementation yet for let's say SSE I would have to duplicate some code from the plain(Scalar) C++ implementation
  • I cannot fallback to the plain C++ implementation if needed

What I couldn't figure out yet for any of the variants is how to properly and cleanly fallback to the plain C++ implementation.

Also I don't want to over-engineer things and in that respect the first variant seems the most "KISS" like even considering the disadvantages.


You could use a policy based approach with templates kind of like the way the standard library does for allocators, comparators and the like. Each implementation has a policy class which defines computeSome() and computeMore(). Your Algo class takes a policy as a parameter and defers to its implementation.

template <class policy_t>
class algo_with_policy_t {
    policy_t policy_;
public:
    algo_with_policy_t() { }
    ~algo_with_policy_t() { }

    void process()
    {
        policy_.computeSome();
        policy_.computeMore();
    }
};

struct altivec_policy_t {
    void computeSome();
    void computeMore();
};

struct sse_policy_t {
    void computeSome();
    void computeMore();
};

struct scalar_policy_t {
    void computeSome();
    void computeMore();
};

// let user select exact implementation
typedef algo_with_policy_t<altivec_policy_t> algo_altivec_t;
typedef algo_with_policy_t<sse_policy_t> algo_sse_t;
typedef algo_with_policy_t<scalar_policy_t> algo_scalar_t;

// let user have default implementation
typedef
#if defined(ALTIVEC)
    algo_altivec_t
#elif defined(SSE)
    algo_sse_t
#else
    algo_scalar_t
#endif
    algo_default_t;

This lets you have all the different implementations defined within the same file (like solution 1) and compiled into the same program (unlike solution 1). It has no performance overheads (unlike virtual functions). You can either select the implementation at run time or get a default implementation chosen by the compile time configuration.

template <class algo_t>
void use_algo(algo_t algo)
{
    algo.process();
}

void select_algo(bool use_scalar)
{
    if (!use_scalar) {
        use_algo(algo_default_t());
    } else {
        use_algo(algo_scalar_t());
    }
}


As requested in the comments, here's a summary of what I did:

Set up policy_list helper template utility

This maintains a list of policies, and gives them a "runtime check" call before calling the first suitable implementaiton

#include <cassert>

template <typename P, typename N=void>
struct policy_list {
  static void apply() {
    if (P::runtime_check()) {
      P::impl();
    }
    else {
      N::apply();
    }
  }
};

template <typename P>
struct policy_list<P,void> {
  static void apply() {
    assert(P::runtime_check());
    P::impl();
  }
};

Set up specific policies

These policies implement a both a runtime test and an actual implementation of the algorithm in question. For my actual problem impl took another template parameter that specified what exactly it was they were implementing, here though the example assumes there is only one thing to be implemented. The runtime tests are cached in a static bool for some (e.g. the Altivec one I used) the test was really slow. For others (e.g. the OpenCL one) the test is actually "is this function pointer NULL?" after one attempt at setting it with dlsym().

#include <iostream>

// runtime SSE detection (That's another question!)
extern bool have_sse();

struct sse_policy {
  static void impl() {
    std::cout << "SSE" << std::endl;
  }

  static bool runtime_check() {
    static bool result = have_sse();
    // have_sse lives in another TU and does some cpuid asm stuff
    return result;
  }
};

// Runtime OpenCL detection
extern bool have_opencl();

struct opencl_policy {
  static void impl() {
    std::cout << "OpenCL" << std::endl;
  }

  static bool runtime_check() {
    static bool result = have_opencl();
    // have_opencl lives in another TU and does some LoadLibrary or dlopen()
    return result;
  }
};

struct basic_policy {
  static void impl() {
    std::cout << "Standard C++ policy" << std::endl;
  }

  static bool runtime_check() { return true; } // All implementations do this
};

Set per architecture policy_list

Trivial example sets one of two possible lists based on ARCH_HAS_SSE preprocessor macro. You might generate this from your build script, or use a series of typedefs, or hack support for "holes" in the policy_list that might be void on some architectures skipping straight to the next one, without trying to check for support. GCC sets some preprocessor macors for you that might help, e.g. __SSE2__.

#ifdef ARCH_HAS_SSE
typedef policy_list<opencl_policy,
        policy_list<sse_policy,
        policy_list<basic_policy
                    > > > active_policy;
#else
typedef policy_list<opencl_policy,
        policy_list<basic_policy
                    > > active_policy;
#endif

You can use this to compile multiple variants on the same platform too, e.g. and SSE and no-SSE binary on x86.

Use the policy list

Fairly straightforward, call the apply() static method on the policy_list. Trust that it will call the impl() method on the first policy that passes the runtime test.

int main() {
  active_policy::apply();
}

If you take the "per operation template" approach I mentioned earlier it might be something more like:

int main() {
  Matrix m1, m2;
  Vector v1;

  active_policy::apply<matrix_mult_t>(m1, m2);
  active_policy::apply<vector_mult_t>(m1, v1);
}

In that case you end up making your Matrix and Vector types aware of the policy_list in order that they can decide how/where to store the data. You can also use heuristics for this too, e.g. "small vector/matrix lives in main memory no matter what" and make the runtime_check() or another function test the appropriateness of a particular approach to a given implementation for a specific instance.

I also had a custom allocator for containers, which produced suitably aligned memory always on any SSE/Altivec enabled build, regardless of if the specific machine had support for Altivec. It was just easier that way, although it could be a typedef in a given policy and you always assume that the highest priority policy has the strictest allocator needs.

Example have_altivec():

I've included a sample have_altivec() implementation for completeness, simply because it's the shortest and therefore most appropriate for posting here. The x86/x86_64 CPUID one is messy because you have to support the compiler specific ways of writing inline ASM. The OpenCL one is messy because we check some of the implementation limits and extensions too.

#if HAVE_SETJMP && !(defined(__APPLE__) && defined(__MACH__))
jmp_buf jmpbuf;

void illegal_instruction(int sig) {
   // Bad in general - https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not+call+longjmp%28%29+from+inside+a+signal+handler
   // But actually Ok on this platform in this scenario
   longjmp(jmpbuf, 1);
}
#endif

bool have_altivec()
{
    volatile sig_atomic_t altivec = 0;
#ifdef __APPLE__
    int selectors[2] = { CTL_HW, HW_VECTORUNIT };
    int hasVectorUnit = 0;
    size_t length = sizeof(hasVectorUnit);
    int error = sysctl(selectors, 2, &hasVectorUnit, &length, NULL, 0);
    if (0 == error)
        altivec = (hasVectorUnit != 0);
#elif HAVE_SETJMP_H
    void (*handler) (int sig);
    handler = signal(SIGILL, illegal_instruction);
    if (setjmp(jmpbuf) == 0) {
        asm volatile ("mtspr 256, %0\n\t" "vand %%v0, %%v0, %%v0"::"r" (-1));
        altivec = 1;
    }
    signal(SIGILL, handler);
#endif

    return altivec;
}

Conclusion

Basically you pay no penalty for platforms that can never support an implementation (the compiler generates no code for them) and only a small penalty (potentially just a very predictable by the CPU test/jmp pair if your compiler is half-decent at optimising) for platforms that could support something but don't. You pay no extra cost for platforms that the first choice implementation runs on. The details of the runtime tests vary between the technology in question.


If the virtual function overhead is acceptable, option 3 plus a few ifdefs seems a good compromise IMO. There are two variations that you could consider: one with abstract base class, and the other with the plain C implementation as the base class.

Having the C implementation as the base class lets you gradually add the vector optimized versions, falling back on the non-vectorized versions as you please, using an abstract interface would be a little cleaner to read.

Also, having separate C++ and vectorized versions of your class let you easily write unit tests that

  • Ensure that the vectorized code is giving the right result (easy to mess this up, and vector floating registers can have different precision than FPU, causing different results)
  • Compare the performance of the C++ vs the vectorized. It's often good to make sure the vectorized code is actually doing you any good. Compilers can generate very tight C++ code that sometimes does as well or better than vectorized code.

Here's one with the plain-c++ implementations as the base class. Adding an abstract interface would just add a common base class to all three of these:

// Algo.h:

 class Algo_Impl    // Default Plain C++ implementation
{
public:
     virtual ComputeSome();
     virtual ComputeSomeMore();
     ...
};

// Algo_SSE.h:
class Algo_Impl_SSE : public Algo_Impl   // SSE
{
public:
     virtual ComputeSome();
     virtual ComputeSomeMore();
     ...
};

// Algo_Altivec.h:
class Algo_Impl_Altivec : public Algo_Impl    // Altivec implementation
{
public:
     virtual ComputeSome();
     virtual ComputeSomeMore();
     ...
};

// Client.cpp:
Algo_Impl *myAlgo = 0;
#ifdef SSE
    myAlgo = new Algo_Impl_SSE;
#elseif defined(ALTIVEC)
    myAlgo = new Algo_Impl_Altivec;
#else
    myAlgo = new Algo_Impl_Default;
#endif
...


You may consider to employ adapter patterns. There are a few types of adapters and it's quite an extensible concept. Here is an interesting article Structural Patterns: Adapter and Façade that discusses very similar matter to the one in your question - the Accelerate framework as an example of the Adapter patter.

I think it is a good idea to discuss a solution on the level of design patterns without focusing on implementation detail like C++ language. Once you decide that the adapter states the right solutiojn for you, you can look for variants specific to your implemementation. For example, in C++ world there is known adapter variant called generic adapter pattern.


This isn't really a whole answer: just a variant on one of your existing options. In option 1 you've assumed that you include algo_altivec.cpp &c. into algo.cpp, but you don't have to do this. You could omit algo.cpp entirely, and have your build system decide which of algo_altivec.cpp, algo_sse.cpp, &c. to build. You'd have to do something like this anyway whichever option you use, since each platform can't compile every implementation; my suggestion is only that whichever option you choose, instead of having #if ALTIVEC_ENABLED everywhere in the source, where ALTIVEC_ENABLED is set from the build system, you just have the build system decide directly whether to compile algo_altivec.cpp . This is a bit trickier to achieve in MSVC than make, scons, &c., but still possible. It's commonplace to switch in a whole directory rather than individual source files; that is, instead of algo_altivec.cpp and friends, you'd have platform/altivec/algo.cpp, platform/sse/algo.cpp, and so one. This way, when you have a second algorithm you need platform-specific implementations for, you can just add the extra source file to each directory.

Although my suggestion's mainly intended to be a variant of option 1, you can combine this with any of your options, to let you decide in the build system and at runtime which options to offer. In that case, though, you'll probably need implementation-specific header files too.


In order to hide the implementation details you may just use an abstract interface with static creator and provide three 3 implementation classes:

// --------------------- Algo.h ---------------------
#pragma once

typedef boost::shared_ptr<class Algo> AlgoPtr;

class Algo
{
public:
    static AlgoPtr Create(std::string type);
    ~Algo();

    void process();

protected:
    virtual void computeSome() = 0;
    virtual void computeMore() = 0;
};

// --------------------- Algo.cpp --------------------- 
class PlainAlgo: public Algo { ... };
class AltivecAlgo: public Algo { ... };
class SSEAlgo: public Algo { ... };

static AlgoPtr Algo::Create(std::string type) { /* Factory implementation */ }

Please note, that since PlainAlgo, AlivecAlgo and SSEAlgo classes are defined in Algo.cpp, they are only seen from this compilation unit and therefore the implementation details hidden from the outside world.

Here is how one can use your class then:

AlgoPtr algo = Algo::Create("SSE");
algo->Process();


It seems to me that your first strategy, with separate C++ files and #including the specific implementation, is the simplest and cleanest. I would only add some comments to your Algo.cpp indicating which methods are in the #included files.
e.g.

// Algo.cpp
#include "Algo.h"
Algo::Algo() { }

Algo::~Algo() { }

void Algo::process()
{
    computeSome();
    computeMore();
}

// The following methods are implemented in separate, 
// platform-specific files.
// void Algo::computeSome()
// void Algo::computeMore()

#if defined(ALTIVEC)
    #include "Algo_Altivec.cpp" 
#elif defined(SSE)
    #include "Algo_SSE.cpp"
#else
    #include "Algo_Scalar.cpp"
#endif


Policy-like templates (mixins) are fine until the requirement to fall back to default implementation. It's runtime opeation and should be handled by runtime polymorphism. Strategy pattern can handle this fine.

There's one drawback of this approach: Strategy-like algorithm implemented cannot be inlined. Such inlining can provide reasonable performance improvement in rare cases. If this is an issue you'll need to cover higher-level logic by Strategy.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜