开发者

Map reduce problem in python

I am currently struggling with an assignment. The solution would input a txt file and run through counting the number of palindromes and their frequency. I need to use Map reduce to create to do so

For example: the string "bab bab bab cab cac dad" would output:

bab 3
cab 1
dad 1

He开发者_Python百科re is what I have so far

def palindrome(string):
    palindromes = []
    for word in string.split(" "):
        if (word == word[::-1]):
            palindromes.append(word)
    return palindromes 

string = "abc abd bab tab cab tat yay uaefdfdu"
print map(lambda x: palindrome(x), ["bab abc dab bab bab dad crap pap pap "])

Currently prints

[['bab', 'bab', 'bab', 'dad', 'pap', 'pap', '']]

Here is my attempt so far at the reduce section

def p(lists):
for list in lists:

set_h = set(list) 

return set_h

with the p function I want to create a set of all palindromes found. Then run a count of the palindromes on the list and make a dict out of this

print reduce(p, [['bab', 'bab', 'bab', 'dad', 'pap', 'pap', '']])

Am I on the right track?


I think it would be much easier for you if your map() and reduce() input was an actual list of words. To achieve that, .split() the string before passing it to map(). Then map() a word either to itself (if your mapper encounters a palindrome) or None. You can then filter() the results to discard None values, sort it and pass it to reduce(). reduce() would then reduce it to a dict mapping words to their total count.

I will not provide you with a working solution not to take away from the learning factor.


Split your string into a list before you map it. map() is for lists, sets, and dicts, not strings.

word_list = words_str.split(" ")

Avoid using map-filter-reduce unless your assignment dictates it; GVR says so. The proper solution uses Python's list comprehension syntax. In fact, you can do it with a pretty nasty one-liner:

pal_count = {
    x: word_list.count(x)  # reduce-ish
    for x in word_list     # map-ish
    if x == x[::-1]        # filter-ish
    }
for x, y in pal_count.iteritems():
    print x, y             # print the results!

Breaking it down...

  1. Catch this in a dictionary object to print it later: pal_count = {
  2. Define the return objects: x: word_list.count(x) We use key:value syntax to associate the palindrome, x, with its number of occurrences. count() is like a built-in reduce function for lists.
  3. Iterate through our list with a for loop, assigning the current value to 'x': for x in word_list
  4. We only want to return palindromes, so we add a comparison operator to filter out bad values: if x == x[::-1] # cool logic, btw
  5. Hurray! }

By the way, I'm only doing your homework because I never did mine.

The slower, less flexible, less portable, less awesome equivalent uses nested for loops:

pal_count = dict()
for x in word_list:                     # same loop
    if x == x[::-1]                     # is this a palindrome?
        if x in pal_count:              # have we seen before?
            pal_count[x] += 1
        else:                           # this one is new!
            pal_count.setdefault(x, 1)


For your reduce function, you should start off with an empty dict and update/populate your counts. Reduce functions require 2 parameters, so one can be your dict, and the other, your palindrome. You can feed in an initial value in reduce like so:

reduce(lambda x, y: x+y, some_list, initial_value_for_x)

Take a look at dict's get for how to set default values, which should help you simplify your reduce function a lot.


It would be very simple if we decompose the problem into small challenges. In our case this could be:

  1. Filter out all the palindromes from the list of words.
  2. Get unique words to find the count.
  3. Map all the unique words to their appropriate count.

Code:

words =  "bab bab bab cab cac dad"
is_palindrome = lambda word : word == word[::-1]
palindromes = filter(is_palindrome,words.split(" "))
get_count = lambda word : (word , palindromes.count(word))
unique = set(palindromes)
dict(map(get_count,unique))
Out[1]: {'bab': 3, 'cac': 1, 'dad': 1}

Here is the short explanation:

#Input:
    words =  "bab bab bab cab cac dad"

#Step 1: Filter out the palindromes.
    is_palindrome = lambda word : word == word[::-1]
    palindromes = filter(is_palindrome,words.split(" "))

#Step 2: Get the unique set of string to find their counts.
    unique = set(palindromes)

#Step 3: Map every unique palindrome to their respective count.
    get_count = lambda word : (word , palindromes.count(word))
    dict(map(get_count,unique))

#Output:
    Out[1]: {'bab': 3, 'cac': 1, 'dad': 1}

NOTE: map in python can accept any sequence, not just list, set or dict. Strings in python are also sequence, hence not satisfied with Cody Hess's statement: map cannot accept strings.

To demonstrate here is the very simple demo:

In [10]: map(echo, "python")
Out[10]: ['p', 'y', 't', 'h', 'o', 'n']


For map/reduce, using Counter object is pretty straight forward.

from collections import Counter 
words = "bab bab bab cab cac dad"
words_list = words.split(" ") 
cont = Counter()
for word in words_list:
    cont[word] += 1
print(cont)
# or if you want dict
print(dict(cont))

https://docs.python.org/3/library/collections.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜