Map reduce problem in python

2023-04-08 01:30 问答作者：

I am currently struggling with an assignment. The solution would input a txt file and run through counting the number of palindromes and their frequency. I need to use Map reduce to create to do so

For example: the string "bab bab bab cab cac dad" would output:

bab 3
cab 1
dad 1

He开发者_Python百科re is what I have so far

def palindrome(string):
    palindromes = []
    for word in string.split(" "):
        if (word == word[::-1]):
            palindromes.append(word)
    return palindromes 

string = "abc abd bab tab cab tat yay uaefdfdu"
print map(lambda x: palindrome(x), ["bab abc dab bab bab dad crap pap pap "])

Currently prints

[['bab', 'bab', 'bab', 'dad', 'pap', 'pap', '']]

Here is my attempt so far at the reduce section

def p(lists):
for list in lists:

set_h = set(list) 

return set_h

with the p function I want to create a set of all palindromes found. Then run a count of the palindromes on the list and make a dict out of this

print reduce(p, [['bab', 'bab', 'bab', 'dad', 'pap', 'pap', '']])

Am I on the right track?

I think it would be much easier for you if your map() and reduce() input was an actual list of words. To achieve that, .split() the string before passing it to map(). Then map() a word either to itself (if your mapper encounters a palindrome) or None. You can then filter() the results to discard None values, sort it and pass it to reduce(). reduce() would then reduce it to a dict mapping words to their total count.

I will not provide you with a working solution not to take away from the learning factor.

Split your string into a list before you map it. map() is for lists, sets, and dicts, not strings.

word_list = words_str.split(" ")

Avoid using map-filter-reduce unless your assignment dictates it; GVR says so. The proper solution uses Python's list comprehension syntax. In fact, you can do it with a pretty nasty one-liner:

pal_count = {
    x: word_list.count(x)  # reduce-ish
    for x in word_list     # map-ish
    if x == x[::-1]        # filter-ish
    }
for x, y in pal_count.iteritems():
    print x, y             # print the results!

Breaking it down...

Catch this in a dictionary object to print it later: pal_count = {
Define the return objects: x: word_list.count(x) We use key:value syntax to associate the palindrome, x, with its number of occurrences. count() is like a built-in reduce function for lists.
Iterate through our list with a for loop, assigning the current value to 'x': for x in word_list
We only want to return palindromes, so we add a comparison operator to filter out bad values: if x == x[::-1] # cool logic, btw
Hurray! }

By the way, I'm only doing your homework because I never did mine.

The slower, less flexible, less portable, less awesome equivalent uses nested for loops:

pal_count = dict()
for x in word_list:                     # same loop
    if x == x[::-1]                     # is this a palindrome?
        if x in pal_count:              # have we seen before?
            pal_count[x] += 1
        else:                           # this one is new!
            pal_count.setdefault(x, 1)

For your reduce function, you should start off with an empty dict and update/populate your counts. Reduce functions require 2 parameters, so one can be your dict, and the other, your palindrome. You can feed in an initial value in reduce like so:

reduce(lambda x, y: x+y, some_list, initial_value_for_x)

Take a look at dict's get for how to set default values, which should help you simplify your reduce function a lot.

It would be very simple if we decompose the problem into small challenges. In our case this could be:

Filter out all the palindromes from the list of words.
Get unique words to find the count.
Map all the unique words to their appropriate count.

Code:

words =  "bab bab bab cab cac dad"
is_palindrome = lambda word : word == word[::-1]
palindromes = filter(is_palindrome,words.split(" "))
get_count = lambda word : (word , palindromes.count(word))
unique = set(palindromes)
dict(map(get_count,unique))
Out[1]: {'bab': 3, 'cac': 1, 'dad': 1}

Here is the short explanation:

#Input:
    words =  "bab bab bab cab cac dad"

#Step 1: Filter out the palindromes.
    is_palindrome = lambda word : word == word[::-1]
    palindromes = filter(is_palindrome,words.split(" "))

#Step 2: Get the unique set of string to find their counts.
    unique = set(palindromes)

#Step 3: Map every unique palindrome to their respective count.
    get_count = lambda word : (word , palindromes.count(word))
    dict(map(get_count,unique))

#Output:
    Out[1]: {'bab': 3, 'cac': 1, 'dad': 1}

NOTE: map in python can accept any sequence, not just list, set or dict. Strings in python are also sequence, hence not satisfied with Cody Hess's statement: map cannot accept strings.

To demonstrate here is the very simple demo:

In [10]: map(echo, "python")
Out[10]: ['p', 'y', 't', 'h', 'o', 'n']

For map/reduce, using Counter object is pretty straight forward.

from collections import Counter 
words = "bab bab bab cab cac dad"
words_list = words.split(" ") 
cont = Counter()
for word in words_list:
    cont[word] += 1
print(cont)
# or if you want dict
print(dict(cont))

https://docs.python.org/3/library/collections.html

继续阅读：python reduce

Map reduce problem in python

Code:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

Code:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？