Index by word length
My aim was to simply make a hangman game. However, I have been slightly over-ambitious. I want to ask the user to input how long they want the word. Then choose a random word of that length. To index an entire dictionary of that length would take far too long on each iteration. So. I have a dictionary, formatted like so:
zymosans
zymoscope
zymoses
...
I would like to be able output a file for each 'length of word' automatically using this program. Like this:
1letterwords.txt
2letterwords.txt
and so forth.
I started python...yesterday. I searched both the web and this site and came up with nothing.开发者_如何学Python I would like some pointers as to how to start with this specific programming problem. Thanks in advance! (To clarify, the hangman game would open a random line in the requested wordlength file, reducing performance impact...fairly dramatically.)
It's really not that big of a deal to load an entire dictionary into memory. You can try something like this:
import random
from collections import defaultdict
# load words
index = defaultdict(list)
with open('words.txt') as file:
for line in file:
word = line.strip().lower()
index[len(word)].append(word)
# pick a random word
length = int(raw_input('Enter word length: '))
word = random.choice(index[length])
And if you insist on generating separate files by word length, run this code after loading the index as shown above:
for length in sorted(index):
path = 'words%d.txt' % length
with open(path, 'w') as file:
for word in index[length]:
file.write('%s\n' % word)
Getting random lines of files is probably not what you want to do either ... keeping them in a list and/or dict should be fine even for millions of words.
you can store list of words by their length by iterating over all your words and adding them to a list seeded defaultdict:
from collections import defaultdict
import random
wordsByLength = defaultdict( list )
for word in allWords:
wordsByLength[ len(word) ].append( word )
Then whenever you need a random word you can do:
randomLen = random.choice( wordsByLength.keys() )
randomWord = random.choice( wordsByLength[ randomLen ] )
Or you can replace randomLen with the specified length you wanted.
e.g.
url = urllib.urlopen('http://download.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt')
random.choice([item for item in url if len(item) == 8])
Sure, the simple way isn't that efficient, but is it really too slow?
In [1]: import random
In [2]: timeit words = list(open("sowpods.txt"))
10 loops, best of 3: 48.4 ms per loop
In [3]: words = list(open("sowpods.txt"))
In [4]: len(words)
Out[4]: 267751
In [5]: timeit random.choice([w for w in words if len(w.strip())==6])
10 loops, best of 3: 62.5 ms per loop
In [6]: random.choice([w for w in words if len(w.strip())==6])
Out[6]: 'NAPKIN\r\n'
The one liner version only takes less than a 10th of a second on this computer
In [7]: timeit random.choice([w for w in open("sowpods.txt") if len(w.strip())==6])
10 loops, best of 3: 91.2 ms per loop
In [8]: random.choice([w for w in open("sowpods.txt") if len(w.strip())==6])
Out[8]: 'REVEUR\r\n'
You can add a .strip()
to that to get rid of the '\r\n'
on the end
精彩评论