Is there a better/more pythonified way to do this?
I've been teaching myself Python at my new job, and really enjoying the language. I've written a short class to do some basic data manipulation, and I'm pretty confident about it.
But old habits from my structured/modular programming days are hard to break, and I know there must be a better way to write this. So, I was wondering if anyone would like to take a look at the following, and suggest some possible improvements, or put me on to a resource that could help me discover those for myself.
A quick note: The RandomItems root class was written by someone else, and I'm still wrapping my head around the itertools library. Also, this isn't the entire module - just the class I'm working on, and it's prerequisites.
What do you think?
import itertools
import urllib2
import random
import string
class RandomItems(object):
"""This is the root class for the randomizer subclasses. These
are used to generate arbitrary content for each of the fields
in a csv file data row. The purpose is to automatically generate
content that can be used as functional testing fixture data.
"""
def __iter__(self):
while True:
yield self.next()
def slice(self, times):
return itertools.islice(self, times)
class RandomWords(RandomItems):
"""Obtain a list of random real words from the internet, place them
in an iterable list object, and provide a method for retrieving
a subset of length 1-n, of random words from the root list.
"""
def __init__(self):
urls = [
"http://dictionary-thesaurus.com/wordlists/Nouns%285,449%29.txt",
"http://dictionary-thesaurus.com/wordlists/Verbs%284,874%29.txt",
"http://dictionary-thesaurus.com/wordlists/Adjectives%2850%29.txt",
"http://dictionary-thesaurus.com/wordlists/Adjectives%28929%29.txt",
"http://dictionary-thesaurus.com/wordlists/DescriptiveActionWords%2835%29.txt",
"http://dictionary-thesaurus.com/wordlists/WordsThatDescribe%2886%29.txt",
"http://dictionary-thesaurus.com/wordlists/DescriptiveWords%2886%29.txt",
"http://dictionary-thesaurus.com/wordlists/WordsFunToUse%28100%29.txt",
"http://dictionary-thesaurus.com/wordlists/Materials%2847%29.txt",
"http://dictionary-thesaurus.com/wordlists/NewsSubjects%28197%29.txt",
"http://dictionary-thesaurus.com/wordlists/Skills%28341%29.txt",
"http://dictionary-thesaurus.com/wordlists/TechnicalManualWords%281495%29.txt",
"http://dictionary-thesaurus.com/wordlists/GRE_WordList%281264%29.txt"
]
self._words = []
for url in urls:
urlresp = urllib2.urlopen(urllib2.Request(url))
self._words.extend([word for word in urlresp.read().split("\r\n")])
self._words = list(set(self._words)) # Removes duplicates
self._words.sort() # sorts the list
def next(self):
"""Return a single random word from the list
"""
return random.choice(self._words)
def get(self):
"""Return the entire list, if needed.
"""
return self._words
def wordcount(self):
"""Return the total number of words in the list
"""
return len(self._words)
def sublist(self,size=3):
"""Return a random segment of _size_ length. The default is 3 words.
"""
segment = []
for i in range(size):
segment.append(self.next())
#printable = " ".join(segment)
return segment
def random_name(self):
"""Return a string-formatted list of 3 random words.
"""
words = self.sublist()
return "%s %s %s" % (words[0], words[1], words[2])
def main():
"""Just to see it work...
"""
wl = RandomWords()
print wl.wordcount()
print wl.next(开发者_高级运维)
print wl.sublist()
print 'Three Word Name = %s' % wl.random_name()
#print wl.get()
if __name__ == "__main__":
main()
Here are my five cents:
- Constructor should be called
__init__
. - You could abolish some code by using
random.sample
, it does what yournext()
andsublist()
does but it's prepackaged. - Override
__iter__
(define the method in your class) and you can get rid ofRandomIter
. You can read more about at it in the docs (note Py3K, some stuff may not be relevant for lower version). You could useyield
for this which as you may or may not know creates a generator, thus wasting little to no memory. random_name
could usestr.join
instead. Note that you may need to convert the values if they are not guaranteed to be strings. This can be done through[str(x) for x in iterable]
or in-builtmap
.
First knee-jerk reaction: I would offload your hard-coded URLs into a constructor parameter passed to the class and perhaps read from configuration somewhere; this will allow for easier change without necessitating a redeploy.
The drawback of this is that consumers of the class have to know where those URLs are stored... so you could create a companion class whose only job is to know what the URLs are (i.e. in configuration, or even hard-coded) and how to get them. You could allow the consumer of your class to provide the URLs, or if they are not provided, the class could hit up the companion class for the URLs.
精彩评论