Is there a Pythonic way to make this logic more elegant?

2023-01-17 22:47 问答作者：

I'm new to Python, and I've been playing around with it for simple tasks. I have a bunch of CSVs which I need to manipulate in complex ways, but I'm breaking this up into smaller tasks for the sake of learning Python.

For now, given a list of strings, I want to remove user-defined title prefixes of any names in the strings. Any string which contains a name will contain only a name, with or without a title prefix. I have the following, and it works, but it just feels unnecessarily complicated. Is there a more Pythonic way to do this? Thanks!

# Return new list without title prefixes for strings in a list of strings.
def strip_titles(line, title_prefixes):
    new_csv_line = []
    for item in line:
        for title_prefix in title_prefixes:
            if item.startswith(title_prefix):
                new_csv_line.append(item[len(title_prefix)+1:])
                break
            else:
                if title_prefix == title_prefixes[len(title_prefixes)-1]:
         开发者_如何学C           new_csv_line.append(item)
                else:
                    continue
    return new_csv_line

if __name__ == "__main__":
    test_csv_line = ['Mr. Richard Stallman', 'I like cake', 'Mrs. Margaret Thatcher', 'Jean-Claude Van Damme']
    test_prefixes = ['Mr.', 'Ms.', 'Mrs.']
    print strip_titles(test_csv_line, test_prefixes)

[re.sub(r'^(Mr|Ms|Mrs)\.\s+', '', s) for s in test_csv_line]

A more Pythonic approach would be to replace the "end of list" check with an else: clause to the for item in line: loop. The else gets executed if the for loop completes without being interrupted:

# Return new list without title prefixes for strings in a list of strings.    
def strip_titles(line, title_prefixes):
    new_csv_line = []
    for item in line:
        for title_prefix in title_prefixes:
            if item.startswith(title_prefix):
                new_csv_line.append(item[len(title_prefix)+1:])
                break
        else:
            new_csv_line.append(item)
    return new_csv_line

The logic is otherwise the same as yours.

Assuming that prefixes is variable, perhaps as an aspect of localization, or you prefer not to use a regular expression for some other reason, you could do something like this (untested code):

def strip_title(string, prefixes):
    for prefix in prefixes:
         if string.startswith(prefix + ' '):
             return string[len(prefix) + 1:]
    return string

stripped = (list(strip_title(cell, prefixes) for cell in line)
            for line in lines)

This is not particularly efficient, since the algorithm ends up doing a lot of redundant checking (e.g. checking three times if the line starts with M). This sort of thing is a big reason to use regular expressions.

Alternatively, you could dynamically build a regular expression, by escaping each prefix and joining them with | branches:

def TitleStripper(prefixes):
    import re
    escaped_titles = (re.escape(prefix) for prefix in prefixes)
    prefix_re = re.compile('^({0}) '.format('|'.join(escaped_titles)))
    def strip_title(string):
        return prefix_re.sub('', string, 1)
    return strip_title

The function TitleStripper creates a closure function strip_title that works like the previous one but is built for a particular set of prefixes. After you call strip_title = TitleStripper(prefixes) you can just call strip_title(string).

Mostly due to the use of regular expressions, this will be a bit faster than the first method, perhaps at the expense of clarity.

If you really only ever need to check for three prefixes, either of these methods is overkill, and you should just use a static RE as explained in another answer.

继续阅读：python

Is there a Pythonic way to make this logic more elegant?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？