开发者

Appending strings to a list

Novice question here.

I'm working on finding every single combination of various string elements, and then want to place them into a list.

import itertools 

mydata = [ ]

main = [["car insurance", "auto insurance"], ["insurance"], ["cheap", "budget"],
      ["low cost"], ["quote", "quotes"], ["rate", "rates"], ["comparison"]]

def twofunc(one, two):
    f开发者_开发技巧or a, b in itertools.product(main[one], main[two]):
        print a, b

def threefunc(one, two, three):
    for a, b, c in itertools.product(main[one], main[two], main[three]):
        print a, b, c

twofunc(2, 0)  #extremely inefficient to just run these functions over and over. alternative?
twofunc(3, 0)
twofunc(0, 4)
twofunc(0, 5)
twofunc(0, 6)
threefunc(2, 0, 4)
threefunc(3, 0, 4)
threefunc(2, 0, 5)
threefunc(3, 0, 5)
threefunc(2, 0, 6)
threefunc(3, 0, 6)

The above code just prints out each permutation, but doesn't append the values to the list. I've tried various variations on the append method and still have no luck.

Can anyone help me with placing these values into the mydata list. I assume that each string will have to be a seperate list, so it will end up being a list of lists. It should look like the following, but I'll also need some way to include "tags" in the future, or just values for if the string contains a value.

[["cheap car insurance"],
["cheap auto insurance"],
["budget car insurance"],
["budget auto insurance"],
["low cost car insurance"],
["low cost auto insurance"],
...

so, eventually, will end up: 1 means that the strings contains the word car/cheap, and 0 means that it doesn't. The reason I mention this is just to inquire if a list is the proper data structure for this task.

                         car        cheap 
cheap car insurance       1            1
cheap auto insurance      0            1 
budget car insurance      1            0
budget auto insurance     0            0

Can anyone help.

I've completed this task in R, which is pretty amenable to this task, and just wanted to reproduce it in Python.


To get the return values of twofunc and threefunc into a list, you can change the return statements so that lists are returned. Then append the result to mydata. An example follows for twofunc:

def twofunc(one, two):
    for a, b in itertools.product(main[one], main[two]):
        return [a, b]

mydata.append(twofunc(2,0))

That said, I'm not familiar with R, so obviously don't know what data structure you might be using there. For your stated goal, keeping this in a list might get complicated. However, creating a simple class to do this shouldn't be too difficult. You could then have a list made up of instantiations of this class.


main = [["car", "auto"], ["insurance"],["cheap", "budget"]]

reduce(lambda x,y: [e+' '+f for e,f in itertools.product(x,y)],main)

or

reduce(lambda x,y: [' '.join([e,f]) for e,f in itertools.product(x,y)],main)

the latter should be faster
the result is:

['car insurance cheap', 'car insurance budget', 'auto insurance cheap', 'auto insurance budget']


First, you don't need to write seperate twofunc and threefunc:

def nfunc(strings, indices):
    """call as nfunc(main,(2,0)) or nfunc(main, (2,0,4))"""
    selected_strings=map(lambda index:strings[index], indices)
    return itertools.product(*selected_strings)

Note that this just returns the iterable object, so it'll be computed lazily.

Now, we can split the list of indices out of the imperative code as a list just like you did with the strings, so you don't need to rewrite any function calls to change the indices:

index_list=[(2, 0),(3, 0),...,(2, 0, 4),(3, 0, 4),...,(3, 0, 6)]

To just get a list of lazy results, you can now write:

lazy_results=[nfunc(main, indices) for indices in index_list]
-> [itertools.product at 0xblahblah, itertools.product at 0xblahblah, ...]

to force evaluation of the results:

eager_results=[list(lazy) for lazy in lazy_results]
-> [[("cheap","car insurance"),("budget","car insurance"), ...

to flatten the tuples into strings:

str_results=[[' '.join(rtuple) for rtuple in result] for result in eager_results]
-> [["cheap car insurance", "budget car insurance", ...

and you can add arbitrary tags, if you need them:

tag_index_list=[(tag_1,(2,0)), ... (tag_n, (3, 0, 6))]
tag_results=[(tag, nfunc(main, indices)) for (tag, indices) in tag_index_list]

Depending on what you need to do, it might be better to write a class for your result set instead; if you step through the code above though, you'll get a feel for how it would look to use nested lists and tuples.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜