Pythonic way to rewrite the following C++ string processing code

2023-01-24 06:27 问答作者：

Previous, I am having a C++ string processing code which is able to do this.

input -> Hello 12
output-> Hello

input -> Hello 12 World
output-> Hello World

input -> Hello12 World
output-> Hello World

input -> Hello12World
output-> HelloWorld

The following is the C++ code.

std::string Utils::toStringWithoutNumerical(const std::string& str) {
    std::string result;

    bool alreadyAppendSpace = false;
    for (int i = 0, length = str.length(); i < length; i++) {
        const char c = str.at(i);
        if (isdigit(c)) {
            continue;
        }
        if (isspace(c)) {
            if (false == alreadyAppendSpace) {
                resul开发者_开发问答t.append(1, c);
                alreadyAppendSpace = true;
            }
            continue;
        }
        result.append(1, c);
        alreadyAppendSpace = false;
    }

    return trim(result);
}

May I know in Python, what is the Pythonic way for implementing such functionality? Is regular expression able to achieve so?

Thanks.

Edit: This reproduces more accurately what the C++ code does than the previous version.

s = re.sub(r"\d+", "", s)
s = re.sub(r"(\s)\s*", "\1", s)

In particular, if the first whitespace in a run of several whitespaces is a tab, it will preserve the tab.

Further Edit: To replace by a space anyway, this works:

s = re.sub(r"\d+", "", s)
s = re.sub(r"\s+", " ", s)

Python has a lot of built-in functions that can be very powerful when used together.

def RemoveNumeric(str):
    return ' '.join(str.translate(None, '0123456789').split())

>>> RemoveNumeric('Hello 12')
'Hello'
>>> RemoveNumeric('Hello 12 World')
'Hello World'
>>> RemoveNumeric('Hello12 World')
'Hello World'
>>> RemoveNumeric('Hello12World')
'HelloWorld'

import re
re.sub(r'[0-9]+', "", string)

import re
re.sub(r"(\s*)\d+(\s*)", lambda m: m.group(1) or m.group(2), string)

Breakdown:

\s* matches zero or more whitespace.
\d+ matches one or more digits.
The parentheses are used to capture the whitespace.
The replacement parameter is normally a string, but it can alternatively be a function which constructs the replacement dynamically.
lambda is used to create an inline function which returns whichever of the two capture groups is non-empty. This preserves a space if there was whitespace and returns an empty string if there wasn't any.

The regular expression answers are clearly the right way to do this. But if you're interested in a way to do if you didn't have a regex engine, here's how:

class filterstate(object):
    def __init__(self):
        self.seenspace = False
    def include(self, c):
        isspace = c.isspace()
        if (not c.isdigit()) and (not (self.seenspace and isspace)):
            self.seenspace = isspace
            return True
        else:
            return False

def toStringWithoutNumerical(s):
    fs = filterstate()
    return ''.join((c for c in s if fs.include(c)))

继续阅读：python

Pythonic way to rewrite the following C++ string processing code

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？