开发者

Speed up NumPy loop

I'm running a model in Python and I'm trying to speed up the execution time. Through profiling the code I've found that a huge amount of the total processing time is spent in the cell_in_shadow function below. I'm wondering if there is any way to speed it up?

The aim of the function is to provide a boolean response stating whether the specified cell in the NumPy array is shadowed by another cell (in the x direction only). It does this by stepping backwards along the row checking each cell against the height it must be to make the given cell in shadow. The values in shadow_map are calculated by another function not shown here - for this example, take shadow_map to be an array with values similar to:

[0] = 0 (not used)
[1] = 3
[2] = 7
[3] = 18

The add_x function is used to ensure that the array indices loop around (using clock-face arithmetic), as the grid has periodic boundaries (anything going off one side will re-appear on the other side).

def cell_in_shadow(x, y):
   """Returns True if the specified cell is in shadow, False if not."""

   # Get the global variables we need
   global grid
   global shadow_map
   global x_len

   # Record the original length and move to the left
   orig_x = x
   x = add_x(x, -1)

   while x != orig_x:
    # Gets the height that's needed from the shadow_map (the array index is the distance using clock-face arithmetic)
      height_needed = shadow_map[( (x - orig_x) % x_len)]
      if grid[y, x] - grid[y, orig_x] >= height_needed:
          return True

    # Go to the cell to the left
    x = add_x(x, -1)

def add_x(a,开发者_如何学C b):
   """Adds the two numbers using clockface arithmetic with the x_len"""
   global x_len

   return (a + b) % x_len


I do agree with Sancho that Cython will probably be the way to go, but here are a couple of small speed-ups:

A. Store grid[y, orig_x] in some variable before you start the while loop and use that variable instead. This will save a bunch of look-up calls to the grid array.

B. Since you are basically just starting at x_len - 1 in shadow_map and working down to 1, you can avoid using the modulus so much. Basically, change:

while x != orig_x:
    height_needed = shadow_map[( (x - orig_x) % x_len)]

to

for i in xrange(x_len-1,0,-1):
    height_needed = shadow_map[i]

or just get rid of the height_needed variable all together with:

    if grid[y, x] - grid[y, orig_x] >= shadow_map[i]:

These are small changes, but they might help a little bit.

Also, if you plan on going the Cython route, I would consider having your function do this process for the whole grid, or at least a row at a time. That will save a lot of the function call overhead. However, you might not be able to really do this depending on how you are using the results.

Lastly, have you tried using Psyco? It takes less work than Cython though it probably won't give you quite as big of a speed boost. I would certainly try it first.


If you're not limited to strict Python, I'd suggest using Cython for this. It can allow static typing of the indices and efficient, direct access to a numpy array's underlying data buffer at c speed.

Check out a short tutorial/example at http://wiki.cython.org/tutorials/numpy

In that example, which is doing operations very similar to what you're doing (incrementing indices, accessing individual elements of numpy arrays), adding type information to the index variables cut the time in half compared to the original. Adding efficient indexing into the numpy arrays by giving them type information cut the time to about 1% of the original.

Most Python code is already valid Cython, so you can just use what you have and add annotations and type information where needed to give you some speed-ups.

I suspect you'd get the most out of adding type information your indices x, y, orig_x and the numpy arrays.


The following guide compares several different approaches to optimising numerical code in python:

Scipy PerformancePython

It is a bit out of date, but still helpful. Note that it refers to pyrex, which has since been forked to create the Cython project, as mentioned by Sancho.

Personally I prefer f2py, because I think that fortran 90 has many of the nice features of numpy (e.g. adding two arrays together with one operation), but has the full speed of compiled code. On the other hand if you don't know fortran then this may not be the way to go.

I briefly experimented with cython, and the trouble I found was that by default cython generates code which can handle arbitrary python types, but which is still very slow. You then have to spend time adding all the necessary cython declarations to get it to be more specific and fast, whereas if you go with C or fortran then you will tend to get fast code straight out of the box. Again this is biased by me already being familiar with these languages, whereas Cython may be more appropriate if Python is the only language you know.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜