开发者

Initializing a C++ vector to random values... fast

Hey, id like to make this as fast as possible because it gets called A LOT in a program i'm writing, so is there any faster way to initialize a C++ vector to r开发者_StackOverflow社区andom values than:

double range;//set to the range of a particular function i want to evaluate.
std::vector<double> x(30, 0.0);
for (int i=0;i<x.size();i++) {
    x.at(i) = (rand()/(double)RAND_MAX)*range;
}

EDIT:Fixed x's initializer.


Right now, this should be really fast since the loop won't execute.

Personally, I'd probably use something like this:

struct gen_rand { 
    double range;
public:
    gen_rand(double r=1.0) : range(r) {}
    double operator()() { 
        return (rand()/(double)RAND_MAX) * range;
    }
};

std::vector<double> x(num_items);
std::generate_n(x.begin(), num_items, gen_rand());

Edit: It's purely a micro-optimization that might make no difference at all, but you might consider rearranging the computation to get something like:

struct gen_rand { 
    double factor;
public:
    gen_rand(double r=1.0) : factor(range/RAND_MAX) {}
    double operator()() { 
        return rand() * factor;
    }
};

Of course, there's a really good chance the compiler will already do this (or something equivalent) but it won't hurt to try it anyway (though it's really only likely to help with optimization turned off).

Edit2: "sbi" (as is usually the case) is right: you might gain a bit by initially reserving space, then using an insert iterator to put the data into place:

std::vector<double> x;
x.reserve(num_items);
std::generate_n(std::back_inserter(x), num_items, gen_rand());

As before, we're into such microscopic optimization, I'm not at all sure I'd really expect to see a difference at all. In particular, since this is all done with templates, there's a pretty good chance most (if not all) the code will be generated inline. In that case, the optimizer is likely to notice that the initial data all gets overwritten, and skip initializing it.

In the end, however, nearly the only part that's really likely to make a significant difference is getting rid of the .at(i). The others might, but with optimizations turned on, I wouldn't really expect them to.


I have been using Jerry Coffin's functor method for some time, but with the arrival of C++11, we have loads of cool new random number functionality. To fill an array with random float values we can now do something like the following . . .

const size_t elements = 300;
std::vector<float> y(elements);    
std::uniform_real_distribution<float> distribution(0.0f, 2.0f); //Values between 0 and 2
std::mt19937 engine; // Mersenne twister MT19937
auto generator = std::bind(distribution, engine);
std::generate_n(y.begin(), elements, generator); 

See the relevant section of Wikipedia for more engines and distributions


Yes, whereas x.at(i) does bounds checking, x[i] does not do so. Also, your code is incorrect as you have failed to specify the size of x in advance. You need to use std::vector<double> x(n), where n is the number of elements that you want to use; otherwise, your loop there will never execute.

Alternatively, you may want to make a custom iterator for generating random values and filling it using the iterator; because the std::vector constructor will initialize its elements, anyway, so if you have a custom iterator class that generates random values you may be able to eliminate a pass over the items.

In terms of implementing an iterator of your own, here is my untested code:

 class random_iterator
 {
     public:
         typedef std::input_iterator_tag iterator_category;
         typedef double value_type;
         typedef int difference_type;
         typedef double* pointer;
         typedef double& reference;

         random_iterator() : _range(1.0), _count(0) {}
         random_iterator(double range, int count) : 
                                         _range(range), _count(count) {}
         random_iterator(const random_iterator& o) : 
                                         _range(o._range), _count(o._count) {}
         ~random_iterator(){}

         double operator*()const{ return ((rand()/(double)RAND_MAX) * _range); }
         int operator-(const random_iterator& o)const{ return o._count-_count; }
         random_iterator& operator++(){ _count--; return *this; }
         random_iterator operator++(int){ random_iterator cpy(*this); _count--; return cpy; }
         bool operator==(const random_iterator& o)const{ return _count==o._count; }
         bool operator!=(const random_iterator& o)const{ return _count!=o._count; }

     private:
         double _range;
         int _count;
 };

With the code above, it should be possible to use:

std::vector<double> x(random_iterator(range,number),random_iterator());

That said, the generate code for the other solution given is simpler, and frankly, I would just explicitly fill the vector without resorting to anything fancy like this.... but it is kind of cool to think about.


#include <iostream>
#include <vector>
#include <algorithm>

struct functor {
   functor(double v):val(v) {}
   double operator()() const {
      return (rand()/(double)RAND_MAX)*val;
   }
private:
   double val;
};

int main(int argc, const char** argv) {
   const int size = 10;
   const double range = 3.0f;

   std::vector<double> dvec;
   std::generate_n(std::back_inserter(dvec), size, functor(range));

   // print all
   std::copy(dvec.begin(), dvec.end(), (std::ostream_iterator<double>(std::cout, "\n")));

   return 0;
}

опоздал :(


You may consider using a pseudo-random number generator that gives output as a sequence. Since most PRNGs just provide a sequence anyways, that will be a lot more efficient than simply calling rand() over and over again.

But then, I think I really need to know more about your situation.

  • Why does this piece of code execute so much? Can you restructure your code to avoid re-generating random data so frequently?
  • How big are your vectors?
  • How "good" does your random number generator need to be? High-quality distributions tend to be more expensive to calculate.
  • If your vectors are large, are you reusing their buffer space, or are you throwing it away and reallocating it elsewhere? Creating new vectors willy-nilly is a great way to destroy your cache.


@Jerry Coffin's answer looks very good. Two other thoughts, though:

  1. Inlining - All of your vector access will be very fast, but if the call to rand() is out-of-line, the function call overhead might dominate. If that's the case, you may need to roll your own pseudorandom number generator.

  2. SIMD - If you're going to roll your own PRNG, you might as well make it compute 2 doubles (or 4 floats) at once. This will reduce the number of the int-to-float conversions as well as the multiplications. I've never tried it, but apparently there's a SIMD version of the Mersenne Twister that's quite good. A simple linear congruential generator might be good enough too (and that's probably what rand() is using already).


int main() {
  int size = 10;
  srand(time(NULL));
  std::vector<int> vec(size);
  std::generate(vec.begin(), vec.end(), rand);

  std::vector<int> vec_2(size);
  std::generate(vec_2.begin(), vec_2.end(), [](){ return rand() % 50;})
}

You need to include vector, algorithm, time, cstdlib.


The way I think about these is a rubber-meets-the-road approach.
In other words, there are certain minimal things that have to happen, no getting around it, such as:

  • the rand() function has to be called N times.

  • the result of rand() has to be converted to double and then multiplied by something.

  • the resulting numbers have to get stored in consecutive elements of an array.

The object is, at a minimum, to get those things done.

Other concerns, like whether or not to use an std::vector and iterators are fine as long as they don't add any extra cycles. The easiest way to see if they add significant extra cycles is to single-step the code at the assembly language level.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜