Improving performance of raytracing hit function
I have a simple raytracer in python. rendering an image 200x200 takes 4 minutes, which is definitely too much for my taste. I want to improve the situation.
Some points: I shoot multiple rays per each pixel (to provide antialiasing) for a grand total of 16 rays per pixel. 200x200x16 is a grand total of 640000 rays. Each ray must be tested for impact on multiple Sphere objects in the scene. Ray is also a rather trivial object
class Ray(object):
def __init__(self, origin, direction):
self.origin = numpy.array(origin)
self.direction = numpy.array(direction)
Sphere is slightly more complex, and carries the logic for hit/nohit:
class Sphere(object):
def __init__(self, center, radius, color):
self.center = numpy.array(center)
self.radius = numpy.array(radius)
self.color = color
@profile
def hit(self, ray):
temp = ray.origin - self.center
a = numpy.dot(ray.direction, ray.direction)
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius * self.radius
disc = b * b - 4.0 * a * c
if (disc < 0.0):
return None
else:
e = math.sqrt(disc)
denom = 2.0 * a
t = (-b - e) / denom
if (t > 1.0e-7):
normal = (temp + t * ray.direction) / self.radius
hit_point = ray.origin + t * ray.direction
return ShadeRecord.ShadeRecord(normal=normal,
hit_point=hit_point,
parameter=t,
color=self.color)
t = (-b + e) / denom
if (t > 1.0e-7):
normal = (temp + t * ray.direction) / self.radius hit_point = ray.origin + t * ray.direction
return ShadeRecord.ShadeRecord(normal=normal,
hit_point=hit_point,
parameter=t,
color=self.color)
return None
Now, I ran some profiling, and it appears that the longest processing time is in the hit() function
ncalls tottime percall cumtime percall filename:lineno(function)
2560000 118.831 0.000 152.701 0.000 raytrace/objects/Sphere.py:12(hit)
1960020 42.989 0.000 42.989 0.000 {numpy.core.multiarray.array}
1 34.566 34.566 285.829 285.829 raytrace/World.py:25(render)
7680000 33.796 0.000 33.796 0.000 {numpy.core._dotblas.dot}
2560000 11.124 0.000 163.825 0.000 raytrace/World.py:63(f)
640000 10.132 0.000 189.411 0.000 raytrace/World.py:62(hit_bare_bones_object)
640023 6.556 0.000 170.388 0.000 {map}
This does not surprise me, and I want to reduce this value as much as possible. I pass to line profiling, and the result is
Line # Hits Time Per Hit % Time Line Contents
==============================================================
12 @profile
13 def hit(self, ray):
14 2560000 27956358 10.9 19.2 temp = ray.origin - self.center
15 2560000 17944912 7.0 12.3 a = numpy.dot(ray.direction, ray.direction)
16 2560000 24132737 9.4 16.5 b = 2.0 * numpy.dot(temp, ray.direction)
17 2560000 37113811 14.5 25.4 c = numpy.dot(temp, temp) - self.radius * self.radius
18 2560000 20808930 8.1 14.3 disc = b * b - 4.0 * a * c
19
20 2560000 10963318 4.3 7.5 if (disc < 0.0):
21 2539908 5403624 2.1 3.7 return None
22 else:
23 20092 75076 3.7 0.1 e = math.sqrt(disc)
24 20092 104950 5.2 0.1 denom = 2.0 * a
25 20092 115956 5.8 0.1 t = (-b - e) / denom
26 20092 83382 4.2 0.1 if (t > 1.0e-7):
27 20092 525272 26.1 0.4 normal = (temp + t * ray.direction) / self.radius
28 20092 333879 16.6 0.2 hit_point = ray.origin + t * ray.direction
29 20092 299494 14.9 0.2 return ShadeRecord.ShadeRecord(normal=normal, hit_point=hit_point, parameter=t, color=self.color)
So, it appears that most of the time is spent in this chunk of code:
temp = ray.origin - self.center
a = numpy.dot(ray.direction, ray.direction)
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius * self.radius
disc = b * b - 4.0 * a * c
Wher开发者_如何学运维e I don't really see a lot to optimize. Do you have any idea how to make this code more performant without going C ?
Looking at your code, it looks like your main problem is that you have lines of code that are being called 2560000 times. That will tend to take a lot of time regardless what kind of work you are doing in that code. However, using numpy, you can aggregate alot of this work into a small number of numpy calls.
The first thing to do is to combine your rays together into large arrays. Instead of using a Ray object that has 1x3 vectors for origin and direction use Nx3 arrays that have all of the rays you need for the hit detection. The top of your hit function will end up looking like this:
temp = rays.origin - self.center
b = 2.0 * numpy.sum(temp * rays.direction,1)
c = numpy.sum(numpy.square(temp), 1) - self.radius * self.radius
disc = b * b - 4.0 * c
For the next part you can use
possible_hits = numpy.where(disc >= 0.0)
a = a[possible_hits]
disc = disc[possible_hits]
...
to continue with just the values that pass the discriminant test. You can easily get orders of magnitude performance improvements this way.
1) Ray tracing is fun but if you care at all about performance, dump python and switch to C. Not C++ unless you are some kind of super expert, just C.
2) The big win in scenes with multiple (20 or more) objects is to use a spatial index to reduce the number of intersection tests. Popular options are kD-trees, OctTrees, AABB.
3) If you're serious, check out ompf.org - it is the resource for this.
4) Don't go to ompf with python asking about optimization - most people there can shoot 1 Million to 2 Million rays per second through an indoor scene with 100 thousand triangles... Per core.
I love Python and ray tracing, but would never consider putting them together. In this case, the correct optimization is to switch languages.
With code like this you may benefit from memoizing common subexpressions like self.radius * self.radius
as self.radius2
and 1 / self.radius
as self.one_over_radius
. The overhead of the python interpreter may dominate such trivial improvements.
One minor optimization is: a
and b * b
are always positive, so disc < 0.0
is true if (c > 0 && b < min(a, c))
. In this case you can avoid calculating b * b - 4.0 * a * c
. Given the profile you did this will at most shave 7% of the run time of hit I guess.
Your best bet will be to use lookup tables and pre-calculated values as much as possible.
As your response to my comment indicates that your ray direction vectors are unit vectors, in the critical section you listed you can make at least one optimization right off the bat. Any vector dot itself is length squared so a unit vector dot itself will always be 1.
Also, pre-calculate radius squared (in your sphere's __init__
function).
Then you've got:
temp = ray.origin - self.center
a = 1 # or skip this and optimize out later
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius_squared
disc = b * b - 4.0 * c
temp dot temp is going to give you the equivalent of sum( map( lambda component: component*component, temp ) )
... i'm not sure which is faster though.
精彩评论