开发者

How to think in Python after working in C++?

I'm brand new to Python and trying to learn it by replicating the following C++ function into python

// determines which words in a vector consist of the same letters
// outputs the words with the same letters on the same line
void equivalentWords(vector <string> words, ofstream & outFile) {
    outFile << "Equivalent words\n";

    // checkedWord is parallel to the words vector. It is
    // used to make sure each word is only displayed once.
    vector <bool> checkedWord (words.size(), false);

    for(int i = 0; i < words.size(); i++) {
        if (!checkedWord[i]){
            outFile << "  ";
            for(int j = i; j < words.size(); j++){
                if(equivalentWords(words[i], words[j], outFile)) {
                    outFile << words[j] << " ";
                    checkedWord[j] = true;
                }
            }
            outFile << "\n";    
        }   
    }
}

In my python code (below), rather than having a second vector, I have a list ("words") of lists of a string, a sorted list of the chars in the former string (because strings are immutable), and a bool (that tells if the word has been checked yet). However, I can't figure out how to change a value as you iterate through a list.

    for开发者_开发问答 word, s_word, checked in words:
    if not checked:
        for word1, s_word1, checked1 in words:
            if s_word1 == s_word:
                checked1 = True # this doesn't work
                print word1,
        print ""

Any help on doing this or thinking more "Pythony" is appreciated.


Keeping things simple, this is O(N) complexity and should be sufficient if you don't have GBs of word data. Note that set() and dict() basically is a hashed index (free and builtin!).

index = {}
for word, s_word in words:
    index[s_word] = index.get(s_word, []) + [word]

for similar_words in index.values():
    print ' '.join(similar_words)        

Don't know what you are using it for, but it might be of interest to you that in python 2.7 a Counter class was introduced in the collections module.

If you really want to keep your algorithm and update a boolean list (which you don't because that algorithm would do inefficient double loops), you would do it like this:

checked = [False] * len(words)
for i, (word, word_s) in enumerate(words):
    if checked[i]:
       continue
    for j, (other, other_s) in enumerate(words[i:]):
        if word_s == other_s:
            print other,
            checked[i+j] = True
    print


I think the word you're looking for is Pythonic, here's a pythonic code sample for what you're tying to do, determine words that are equivalent, where equivalence is determined by having the same set of letters

import collections

def print_equivalent_words(words):
    eq_words = defaultdict(list)
    for word in words:
        eq_words["".join(sorted(set(word)))].append(word)

    for k,v in eq_words.items():
        print(v)


I generally like catchmeifyoutry's answer, but I would personally tighten it up a bit further as

for word in set(words):
    print word

Edit: My answer is a shorter but functionally equivalent form of catchmeifyoutry's original, pre-edited answer.


This is not the best algorithm to solve this problem (it's O(N^2) instead of O(N)), but here's a pythonic version of it. The method I've used is to replace your array of bits with a set that contains words you've already seen.

checked = set()
for i, word in enumerate(words):
    if word in checked:
        continue
    to_output = [word]
    for word2 in words[i + 1:]:
        if equivalentWords(word, word2):
            to_output.append(word2)
            checked.add(word2)
    print ' '.join(to_output)


Make words a list of objects:

class Word(object):
  def __init__(self, word, s_word, checked=False):
    self.word = word
    self.s_word = s_word
    self.checked = checked

 ....
    for word1 in words:
      if word1.s_word == word.s_word:
        word1.checked = True
        print word1.word
    print


Based on the comment:

// determines which words in a vector consist of the same letters
// outputs the words with the same letters on the same line

I'm not quite sure that the original code works, and even if it does, I can't say I like it much. First of all, based on the loop nesting, it looks like the complexity is O(N2). Second, I can't figure out what it's doing well enough to be sure it really does what's stated above (it uses a three-parameter overload of equivalentWords, which seems to be missing, which makes it hard to say though).

Some of the Python solutions are a lot shorter and simpler -- to the point that I feel reasonably certain they simply don't work. A couple seem to simply print out unique words, which (at least as I interpret it) is not the intent at all.

Here's a version in C++ that does what I interpret the requirements to be:

#include <string>
#include <set>
#include <vector>
#include <algorithm>
#include <iostream>
#include <map>

std::string 
sort_word(std::string word) { 
    std::sort(word.begin(), word.end());
    return word;
}

namespace std { 
    std::ostream &
    operator<<(std::ostream &os, 
               std::pair<std::string, std::set<std::string> >const &words) 
    { 
        std::copy(words.second.begin(), words.second.end(), 
            std::ostream_iterator<std::string>(os, "\t"));
        return os;
    }
}

void 
equivalentWords(std::vector<std::string> const &words, std::ostream &os) { 
    typedef std::map<std::string, std::set<std::string> > word_list_t;
    word_list_t word_list;

    for (int i=0; i<words.size(); i++)
        word_list[sort_word(words[i])].insert(words[i]);

    std::copy(word_list.begin(), word_list.end(),
              std::ostream_iterator<word_list_t::value_type>(os, "\n"));
}

int 
main() { 
    std::vector<std::string> input;

    std::copy(std::istream_iterator<std::string>(std::cin),
        std::istream_iterator<std::string>(),
        std::back_inserter(input));
    equivalentWords(input, std::cout);
    return 0;
}

I think using that as a starting point for a Python version is more likely to produce a clean, working result.


I wouldn't say this is pythonic, but I'm quite proud of it.

import itertools

for _, to_output in itertools.groupby(sorted(words, key=sorted), sorted):
    print ' '.join(to_output)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜