开发者

Pythonic way to rewrite the following C++ string processing code

Previous, I am having a C++ string processing code which is able to do this.

input -> Hello 12
output-> Hello

input -> Hello 12 World
output-> Hello World

input -> Hello12 World
output-> Hello World

input -> Hello12World
output-> HelloWorld

The following is the C++ code.

std::string Utils::toStringWithoutNumerical(const std::string& str) {
    std::string result;

    bool alreadyAppendSpace = false;
    for (int i = 0, length = str.length(); i < length; i++) {
        const char c = str.at(i);
        if (isdigit(c)) {
            continue;
        }
        if (isspace(c)) {
            if (false == alreadyAppendSpace) {
                resul开发者_开发问答t.append(1, c);
                alreadyAppendSpace = true;
            }
            continue;
        }
        result.append(1, c);
        alreadyAppendSpace = false;
    }

    return trim(result);
}

May I know in Python, what is the Pythonic way for implementing such functionality? Is regular expression able to achieve so?

Thanks.


Edit: This reproduces more accurately what the C++ code does than the previous version.

s = re.sub(r"\d+", "", s)
s = re.sub(r"(\s)\s*", "\1", s)

In particular, if the first whitespace in a run of several whitespaces is a tab, it will preserve the tab.

Further Edit: To replace by a space anyway, this works:

s = re.sub(r"\d+", "", s)
s = re.sub(r"\s+", " ", s)


Python has a lot of built-in functions that can be very powerful when used together.

def RemoveNumeric(str):
    return ' '.join(str.translate(None, '0123456789').split())

>>> RemoveNumeric('Hello 12')
'Hello'
>>> RemoveNumeric('Hello 12 World')
'Hello World'
>>> RemoveNumeric('Hello12 World')
'Hello World'
>>> RemoveNumeric('Hello12World')
'HelloWorld'


import re
re.sub(r'[0-9]+', "", string)


import re
re.sub(r"(\s*)\d+(\s*)", lambda m: m.group(1) or m.group(2), string)

Breakdown:

  • \s* matches zero or more whitespace.
  • \d+ matches one or more digits.
  • The parentheses are used to capture the whitespace.
  • The replacement parameter is normally a string, but it can alternatively be a function which constructs the replacement dynamically.
  • lambda is used to create an inline function which returns whichever of the two capture groups is non-empty. This preserves a space if there was whitespace and returns an empty string if there wasn't any.


The regular expression answers are clearly the right way to do this. But if you're interested in a way to do if you didn't have a regex engine, here's how:

class filterstate(object):
    def __init__(self):
        self.seenspace = False
    def include(self, c):
        isspace = c.isspace()
        if (not c.isdigit()) and (not (self.seenspace and isspace)):
            self.seenspace = isspace
            return True
        else:
            return False

def toStringWithoutNumerical(s):
    fs = filterstate()
    return ''.join((c for c in s if fs.include(c)))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜