开发者

Python Algorithm Challenge?

I have a python function (call it myFunction) that gets as input a list of numbers, and, following a complex calculation, returns back the result of the calculation (which is a number).

The function looks like this:

def myFunction( listNumbers ):
    # initialize the result of the calculation
    calcResult = 0

    # looping through all indices, from 0 to the last one
    for i in xrange(0, len(listNumbers), 1):
        # some complex calculation goes here, changing the value of 'calcResult'

    # let us now return the result of the calculation
    return calcResult

I tested the function, and it works as expected.

Normally, myFunction is provided a listNumbers argument that contains 5,000,000 elements in it. As you may expect, the calculation takes time. I need this function to run as fast as possible

Here comes the challenge: assume that the time now is 5am, and that listNumbers contains just 4,999,999 values in it. Meaning, its LAST VALUE is not yet available. This value will only be available at 6am.

Obviously, we can do the following (1st mode): wait until 6am. Then, append the last value into listNumbers, and then, run myFunction. This solution works, BUT it will take a while before myFunction returns our calculated result (as we need to process the entire list of numbers, from the first element on). Remember, our goal is to get the results as soon as possible past 6am.

I was thinking about a more efficient way to solve this (2nd mode): since (at 5am) we have listNumbers with 4,999,999 values in it, let us immediately start running myFunction. Let us process whatever we can (remember, we don't have the last piece of data yet), and then -- exactly at 6am -- 'plug in' the new data piece -- and generate the computed result. This should be significantly faster, as most of the processing will be done BEFORE 6am, hence, we will only have to deal with the new data -- which means the computed result should be available immediately after 6am.

Let's suppose that there's no way for us to inspect the code of myFunction or modify it. Is there ANY programming technique / design idea that will allow us to take myFunction AS IS, and do something with it (without changing its code) so that we can have it operate in the 2nd mode, rather than the 1st one?

Please do not suggest using c++ / numpy + cython / parallel computing etc to solve this problem. The goal here is to see if there's any programming technique or design pattern that开发者_运维知识库 can be easily used to solve such problems.


You could use a generator as an input. The generator will only return when there is data available to process.

Update: thanks for the brilliant comment, I wanted to remove this entry :)

class lazylist(object):
    def __init__(self):
        self.cnt = 0
        self.length = 5000000

    def __iter__(self):
        return self

    def __len__(self):
        return self.length

    def next(self):
        if self.cnt < self.length:
            self.cnt += 1
            #return data here or wait for it
            return self.cnt #just return a counter for this example
        else:
            raise StopIteration()

    def __getitem__(self, i):
        #again, block till you have data.
        return i+1 #simple counter

myFunction(lazylist())

Update: As you can see from the comments and other solutions your loop construct and len call causes a lot of headaches, if you can eliminate it you can use a lot more elegant solution. for e in li or enumerate is the pythonic way to go.


By "list of numbers", do you mean an actual built-in list type?

  • If not, it's simple. Python uses duck-typing, so passing any sequence that supports iteration will do. Use the yield keyword to pass a generator.

    def delayed_list():
        for val in numpy_array[:4999999]:
            yield val
        wait_until_6am()
        yield numpy_array[4999999]
    

and then,

    myFunction(delayed_list())
  • If yes, then it's trickier :)

Also, check out PEP8 for recommended Python code style:

  • no spaces around brackets
  • my_function instead of myFunction
  • for i, val in enumerate(numbers): instead of for i in xrange(0, len(listNumbers), 1): etc.


subclass list so that when the function tries to read the last value it blocks until another thread provides the value.

import threading
import time

class lastblocks(list):
    def __init__(self,*args,**kwargs):
        list.__init__(self,*args,**kwargs)
        self.e = threading.Event()
    def __getitem__(self, index):
        v1 = list.__getitem__(self,index)
        if index == len(self)-1:
            self.e.wait()
            v2 = list.__getitem__(self,index)
            return v2
        else:
            return v1


l = lastblocks(range(5000000-1)+[None])

def reader(l):
    s = 0
    for i in xrange(len(l)):
        s += l[i]
    print s

def writer(l):
    time.sleep(10)
    l[5000000-1]=5000000-1
    l.e.set()
    print "written"

reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()

prints:

written
12499997500000

for numpy:

import threading
import time

import numpy as np

class lastblocks(np.ndarray):
    def __new__(cls, arry):
        obj = np.asarray(arry).view(cls)
        obj.e = threading.Event()
        return obj
    def __array_finalize__(self, obj):
        if obj is None: return
        self.e = getattr(obj, 'e', None)

    def __getitem__(self, index):
        v1 = np.ndarray.__getitem__(self,index)
        if index == len(self)-1:
            self.e.wait()
            v2 = np.ndarray.__getitem__(self,index)
            return v2
        else:
            return v1


l = lastblocks(np.asarray(range(5000000-1)+[None]))

def reader(l):
    s = 0
    for i in xrange(len(l)):
        s += l[i]
    print s

def writer(l):
    time.sleep(10)
    l[5000000-1]=5000000-1
    l.e.set()
    print "written"

reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()


Memory protection barriers are a general way to solve this type of problem when the techniques suggested in the other answers (generators and mock objects) are unavailable.

A memory barrier is a hardware feature that causes an interrupt when a program tries to access a forbidden area of memory (usually controllable at the page level). The interrupt handler can then take appropriate action, for example suspending the program until the data is ready.

So in this case you'd set up a barrier on the last page of the list, and the interrupt handler would wait until 06:00 before allowing the program to continue.


You could just create your own iterator to iterate over the 5,000,000 elements. This would do whatever you need to do to wait around for the final element (can't be specific since the example in the question is rather abstract). I'm assuming you don't care about the code hanging until 6:00, or know how to do it in a background thread.

More information about writing your own iterator is at http://docs.python.org/library/stdtypes.html#iterator-types


There is a simpler generator solution:

def fnc(lst):
    result = 0
    index = 0
    while index < len(lst):
        while index < len(lst):
            ... do some manipulations here ...
            index += 1
        yield result

lst = [1, 2, 3]
gen = fnc(lst)
print gen.next()

lst.append(4)
print gen.next()


I'm a little bit confused about not being able to investigate myFunction. At least you have to know if your list is being iterated or accessed by index. Your example might suggest an index is used. If you want to take advantage of iterators/generators, you have to iterate. I know you said myFunction is unchangeable, but just want to point out, that most pythonic version would be:

def myFunction( listNumbers ):
    calcResult = 0

    # enumerate if you really need an index of element in array
    for n,v in enumerate(listNumbers):
        # some complex calculation goes here, changing the value of 'calcResult'

    return calcResult

And now you can start introducing nice ideas. One is probably wrapping list with your own type and provide __iter__ method (as a generator); you could return value if accessible, wait for more data if you expect any or return after yielding last element.

If you have to access list by index, you can use __getitem__ as in Dan D's example. It'll have a limitation though, and you'll have to know the size of array in advance.


Couldn't you simply do something like this:

processedBefore6 = myFunction([1,2,3]) # the first 4,999,999 vals.

while lastVal.notavailable:
  sleep(1)

processedAfter6 = myFunction([processedBefore6, lastVal])

If the effects are linear (step 1 -> step 2 -> step 3, etc) this should allow you to do as much work as possible up front, then catch the final value when it's available and finish up.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜