Pythonic way to rewrite the following C++ string processing code
Previous, I am having a C++ string processing code which is able to do this.
input -> Hello 12
output-> Hello
input -> Hello 12 World
output-> Hello World
input -> Hello12 World
output-> Hello World
input -> Hello12World
output-> HelloWorld
The following is the C++ code.
std::string Utils::toStringWithoutNumerical(const std::string& str) {
    std::string result;
    bool alreadyAppendSpace = false;
    for (int i = 0, length = str.length(); i < length; i++) {
        const char c = str.at(i);
        if (isdigit(c)) {
            continue;
        }
        if (isspace(c)) {
            if (false == alreadyAppendSpace) {
                resul开发者_开发问答t.append(1, c);
                alreadyAppendSpace = true;
            }
            continue;
        }
        result.append(1, c);
        alreadyAppendSpace = false;
    }
    return trim(result);
}
May I know in Python, what is the Pythonic way for implementing such functionality? Is regular expression able to achieve so?
Thanks.
Edit: This reproduces more accurately what the C++ code does than the previous version.
s = re.sub(r"\d+", "", s)
s = re.sub(r"(\s)\s*", "\1", s)
In particular, if the first whitespace in a run of several whitespaces is a tab, it will preserve the tab.
Further Edit: To replace by a space anyway, this works:
s = re.sub(r"\d+", "", s)
s = re.sub(r"\s+", " ", s)
Python has a lot of built-in functions that can be very powerful when used together.
def RemoveNumeric(str):
    return ' '.join(str.translate(None, '0123456789').split())
>>> RemoveNumeric('Hello 12')
'Hello'
>>> RemoveNumeric('Hello 12 World')
'Hello World'
>>> RemoveNumeric('Hello12 World')
'Hello World'
>>> RemoveNumeric('Hello12World')
'HelloWorld'
import re
re.sub(r'[0-9]+', "", string)
import re
re.sub(r"(\s*)\d+(\s*)", lambda m: m.group(1) or m.group(2), string)
Breakdown:
- \s*matches zero or more whitespace.
- \d+matches one or more digits.
- The parentheses are used to capture the whitespace.
- The replacement parameter is normally a string, but it can alternatively be a function which constructs the replacement dynamically.
- lambdais used to create an inline function which returns whichever of the two capture groups is non-empty. This preserves a space if there was whitespace and returns an empty string if there wasn't any.
The regular expression answers are clearly the right way to do this. But if you're interested in a way to do if you didn't have a regex engine, here's how:
class filterstate(object):
    def __init__(self):
        self.seenspace = False
    def include(self, c):
        isspace = c.isspace()
        if (not c.isdigit()) and (not (self.seenspace and isspace)):
            self.seenspace = isspace
            return True
        else:
            return False
def toStringWithoutNumerical(s):
    fs = filterstate()
    return ''.join((c for c in s if fs.include(c)))
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论