开发者

Improving performance of raytracing hit function

I have a simple raytracer in python. rendering an image 200x200 takes 4 minutes, which is definitely too much for my taste. I want to improve the situation.

Some points: I shoot multiple rays per each pixel (to provide antialiasing) for a grand total of 16 rays per pixel. 200x200x16 is a grand total of 640000 rays. Each ray must be tested for impact on multiple Sphere objects in the scene. Ray is also a rather trivial object

class Ray(object):
    def __init__(self, origin, direction):
        self.origin = numpy.array(origin)
        self.direction = numpy.array(direction)

Sphere is slightly more complex, and carries the logic for hit/nohit:

class Sphere(object):
    def __init__(self, center, radius, color):
        self.center = numpy.array(center)
        self.radius = numpy.array(radius)
        self.color = color

    @profile 
    def hit(self, ray):
        temp = ray.origin - self.center
        a = numpy.dot(ray.direction, ray.direction)
        b = 2.0 * numpy.dot(temp, ray.direction)
        c = numpy.dot(temp, temp) - self.radius * self.radius
        disc = b * b - 4.0 * a * c

        if (disc < 0.0):
            return None
        else:
            e = math.sqrt(disc)
            denom = 2.0 * a
            t = (-b - e) / denom 
            if (t > 1.0e-7):
                normal = (temp + t * ray.direction) / self.radius
                hit_point = ray.origin + t * ray.direction
                return ShadeRecord.ShadeRecord(normal=normal, 
                                               hit_point=hit_point, 
                                               parameter=t, 
                                               color=self.color)           

            t = (-b + e) / denom

            if (t > 1.0e-7):
                normal = (temp + t * ray.direction) / self.radius                hit_point = ray.origin + t * ray.direction
                return ShadeRecord.ShadeRecord(normal=normal, 
                                               hit_point=hit_point, 
                                               parameter=t, 
                                               color=self.color)       

        return None    

Now, I ran some profiling, and it appears that the longest processing time is in the hit() function

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2560000  118.831    0.000  152.701    0.000 raytrace/objects/Sphere.py:12(hit)
  1960020   42.989    0.000   42.989    0.000 {numpy.core.multiarray.array}
        1   34.566   34.566  285.829  285.829 raytrace/World.py:25(render)
  7680000   33.796    0.000   33.796    0.000 {numpy.core._dotblas.dot}
  2560000   11.124    0.000  163.825    0.000 raytrace/World.py:63(f)
   640000   10.132    0.000  189.411    0.000 raytrace/World.py:62(hit_bare_bones_object)
   640023    6.556    0.000  170.388    0.000 {map}

This does not surprise me, and I want to reduce this value as much as possible. I pass to line profiling, and the result is

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    12                                               @profile
    13                                               def hit(self, ray):
    14   2560000     27956358     10.9     19.2          temp = ray.origin - self.center
    15   2560000     17944912      7.0     12.3          a = numpy.dot(ray.direction, ray.direction)
    16   2560000     24132737      9.4     16.5          b = 2.0 * numpy.dot(temp, ray.direction)
    17   2560000     37113811     14.5     25.4          c = numpy.dot(temp, temp) - self.radius * self.radius
    18   2560000     20808930      8.1     14.3          disc = b * b - 4.0 * a * c
    19                                                   
    20   2560000     10963318      4.3      7.5          if (disc < 0.0):
    21   2539908      5403624      2.1      3.7              return None
    22                                                   else:
    23     20092        75076      3.7      0.1              e = math.sqrt(disc)
    24     20092       104950      5.2      0.1              denom = 2.0 * a
    25     20092       115956      5.8      0.1              t = (-b - e) / denom
    26     20092        83382      4.2      0.1              if (t > 1.0e-7):
    27     20092       525272     26.1      0.4                  normal = (temp + t * ray.direction) / self.radius
    28     20092       333879     16.6      0.2                  hit_point = ray.origin + t * ray.direction
    29     20092       299494     14.9      0.2                  return ShadeRecord.ShadeRecord(normal=normal, hit_point=hit_point, parameter=t, color=self.color)

So, it appears that most of the time is spent in this chunk of code:

        temp = ray.origin - self.center
        a = numpy.dot(ray.direction, ray.direction)
        b = 2.0 * numpy.dot(temp, ray.direction)
        c = numpy.dot(temp, temp) - self.radius * self.radius
        disc = b * b - 4.0 * a * c

Wher开发者_如何学运维e I don't really see a lot to optimize. Do you have any idea how to make this code more performant without going C ?


Looking at your code, it looks like your main problem is that you have lines of code that are being called 2560000 times. That will tend to take a lot of time regardless what kind of work you are doing in that code. However, using numpy, you can aggregate alot of this work into a small number of numpy calls.

The first thing to do is to combine your rays together into large arrays. Instead of using a Ray object that has 1x3 vectors for origin and direction use Nx3 arrays that have all of the rays you need for the hit detection. The top of your hit function will end up looking like this:

temp = rays.origin - self.center
b = 2.0 * numpy.sum(temp * rays.direction,1)
c = numpy.sum(numpy.square(temp), 1) - self.radius * self.radius
disc = b * b - 4.0 * c

For the next part you can use

possible_hits = numpy.where(disc >= 0.0)
a = a[possible_hits]
disc = disc[possible_hits]
...

to continue with just the values that pass the discriminant test. You can easily get orders of magnitude performance improvements this way.


1) Ray tracing is fun but if you care at all about performance, dump python and switch to C. Not C++ unless you are some kind of super expert, just C.

2) The big win in scenes with multiple (20 or more) objects is to use a spatial index to reduce the number of intersection tests. Popular options are kD-trees, OctTrees, AABB.

3) If you're serious, check out ompf.org - it is the resource for this.

4) Don't go to ompf with python asking about optimization - most people there can shoot 1 Million to 2 Million rays per second through an indoor scene with 100 thousand triangles... Per core.

I love Python and ray tracing, but would never consider putting them together. In this case, the correct optimization is to switch languages.


With code like this you may benefit from memoizing common subexpressions like self.radius * self.radius as self.radius2 and 1 / self.radius as self.one_over_radius. The overhead of the python interpreter may dominate such trivial improvements.


One minor optimization is: a and b * b are always positive, so disc < 0.0 is true if (c > 0 && b < min(a, c)). In this case you can avoid calculating b * b - 4.0 * a * c. Given the profile you did this will at most shave 7% of the run time of hit I guess.


Your best bet will be to use lookup tables and pre-calculated values as much as possible.

As your response to my comment indicates that your ray direction vectors are unit vectors, in the critical section you listed you can make at least one optimization right off the bat. Any vector dot itself is length squared so a unit vector dot itself will always be 1.

Also, pre-calculate radius squared (in your sphere's __init__ function).

Then you've got:

temp = ray.origin - self.center
a = 1 # or skip this and optimize out later
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius_squared
disc = b * b - 4.0 * c

temp dot temp is going to give you the equivalent of sum( map( lambda component: component*component, temp ) ) ... i'm not sure which is faster though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜