How to optimize this Python code (from ThinkPython, Exercise 10.10)

2023-02-21 06:10 问答作者：

I'm working through Allen Downey's How To Think Like A Computer Scientist, and I've written what I believe to be a functionally correct solution to Exercise 10.10. But it took just over 10 hours (!) to run, so I'm wondering if I'm missing some really obvious and helpful optimization.

Here's the Exercise:

"Two words 'interlock' if taking alternating letters from each forms a new word. For example, 'shoe' and 'cold' interlock to form 'schooled'. Write a program that finds all pairs of words that interlock. Hint: Don't enumerate all pairs!"

(For these word-list problems, Downey has supplied a file with 113809 words. We may assume that these words are in a list, one word per item in the list.)

Here's my solution:

from bisect import bisect_left

def index(lst, target):
    """If target is in list, returns the index of target; otherwise returns None"""
    i = bisect_left(lst, target)
    if i != len(lst) and lst[i] == target:
        return i
    else:
        return None

def interlock(str1, str2):
    "Takes two strings of equal length and 'interlocks' them."
    if len(str1) == len(str2):
        lst1 = list(str1)
        lst2 = list(str2)
        result = []
        for i in range(len(lst1)):
            result.append(lst1[i])
            result.append(lst2[i])
        return ''.join(result)
    else:
        return None

def interlockings(word_lst):
    """Checks each pair of equal-length words to see if their interlocking is a word; prints each successful pair and the total number of successful pairs."""
    total = 0
    for i in range(1, 12):  # 12 because max word length is 22
        # to shorten the loops, get a sublist of words of equal length
        sub_lst = filter(lambda(x): len(x) == i, word_lst)
        f开发者_如何学Pythonor word1 in sub_lst[:-1]:
            for word2 in sub_lst[sub_lst.index(word1)+1:]: # pair word1 only with words that come after word1
                word1word2 = interlock(word1, word2) # interlock word1 with word2
                word2word1 = interlock(word2, word1) # interlock word2 with word1
                if index(lst, word1word2): # check to see if word1word2 is actually a word
                    total += 1
                    print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)
                if index(lst, word2word1): # check to see if word2word1 is actually a word
                    total += 1
                    print "Word 2, %s, Word 1: %s, Interlock: %s" % (word2, word1, word2word1)
    print "Total interlockings: ", total

The print statements are not the problem; my program found only 652 such pairs. The problem is the nested loops, right? I mean, even though I'm looping over lists that contain only words of the same length, there are (for example) 21727 words of length 7, which means my program has to check over 400 million "interlockings" to see if they're actual words---and that's just for the length-7 words.

So again, this code took 10 hours to run (and found no pairs involving words of length 5 or greater, in case you were curious). Is there a better way to solve this problem?

Thanks in advance for any and all insight. I'm aware that "premature optimization is the root of all evil"---and perhaps I've fallen into that trap already---but in general, while I can usually write code that runs correctly, I often struggle with writing code that runs well.

Do it the other way around: Iterate through all words and split them into two words by taking the odd and even letters. Then look up those two words in the dictionary.

As a side node, the two words that interlock must not necessarily have the same length -- the lengths might also differ by 1.

Some (untested) code:

words = set(line.strip() for line in open("words"))
for w in words:
    even, odd = w[::2], w[1::2]
    if even in words and odd in words:
        print even, odd

Alternative definition for interlock:

import itertools

def interlock(str1, str2):
    "Takes two strings of equal length and 'interlocks' them."
    return ''.join(itertools.chain(*zip(str1, str2)))

An alternate version:

with open('words.txt') as inf:
    words = set(wd.strip() for wd in inf)

word_gen = ((word, word[::2], word[1::2]) for word in words)
interlocked = [word for word,a,b in word_gen if a in words and b in words]

On my machine this runs in 0.16 seconds and returns 1254 words.

Edit: as pointed out by @John Machin at Why is this program faster in Python than Objective-C? this can be further improved by lazy execution (only perform the second slice if the first results in a valid word):

with open('words.txt') as inf:
    words = set(wd.strip() for wd in inf)
interlocked = [word for word in words if word[::2] in words and word[1::2] in words]

This drops execution time by a third, to 0.104 seconds.

~~An important thing is your index function: It's the function that runs more than any function. When you don't need the index of the found word, why define a function to find that index?~~

~~if word1word2 in lst: is enough instead of if index(lst, word1word2):.~~

~~The same for if index(lst, word2word1):.~~

OK. bisection works really faster than the in syntax. To improve the speed a bit more, i suggest using the bisect_left function directly in your interlockings function.

For example instead of:

        if index(lst, word1word2): # check to see if word1word2 is actually a word
            total += 1
            print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)

Use:

        q = bisect_left(lst, word1word2)
        if q != len(lst) and lst[q] == word1word2:
            total += 1
            print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)

A very slight improvement in speed.

继续阅读：python

How to optimize this Python code (from ThinkPython, Exercise 10.10)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？