开发者

How to do conditional character replacement within a string

I have a unicode string in Python and basically 开发者_C百科need to go through, character by character and replace certain ones based on a list of rules. One such rule is that a is changed to ö if a is after n. Also, if there are two vowel characters in a row, they get replaced by one vowel character and :. So if I have the string "natarook", what is the easiest and most efficient way of getting "nötaro:k"? Using Python 2.6 and CherryPy 3.1 if that matters.

edit: two vowels in a row does mean the same vowels (oo, aa, ii)


# -*- coding: utf-8 -*-

def subpairs(s, prefix, suffix):
    def sub(i, sentinal=object()):
        r = prefix.get(s[i:i+2], sentinal)
        if r is not sentinal: return r

        r = suffix.get(s[i-1:i+1], sentinal)
        if r is not sentinal: return r
        return s[i]

    s = '\0'+s+'\0'
    return ''.join(sub(i) for i in xrange(1,len(s)))

vowels = [(v+v, u':') for v in 'aeiou']

prefix = {}
suffix = {'na':u'ö'}
suffix.update(vowels)
print subpairs('natarook', prefix, suffix)
# prints: nötaro:k

prefix = {'na':u'ö'}
suffix = dict(vowels)
print subpairs('natarook', prefix, suffix)
# prints: öataro:k


focus on easy and correct first, then consider efficiency if profiling indicates its a bottleneck.

The simple approach is:

prev = None
for ch in string:
  if ch == 'a':
    if prev == 'n':
      ...
  prev = ch


"I know, I'll use regular expressions!"

But seriously, regexes are really good for string manipulation.

You could write one per rule, like so:

s/na/nö/g
s/([aeiou])$1/$1:/g

Or you could generate them at runtime from some other source which lists them all.


Given your rules, I'd say you really want a simple state machine. Hmm, on second thought, maybe not; you can just look back in the string as you go.

I have a unicode string in Python and basically need to go through, character by character and replace certain ones based on a list of rules. One such rule is that a is changed to ö if a is after n. Also, if there are two vowel characters in a row, they get replaced by one vowel character and :. So if I have the string , what is the easiest and most efficient way of getting "nötaro:k"? Using Python 2.6 and CherryPy 3.1 if that matters.

vowel_set = frozenset(['a', 'e', 'i', 'o', 'u', 'ö'])

def fix_the_string(s):
    lst = []
    for i, ch in enumerate(s):
        if ch == 'a' and lst and lst[-1] == 'n':
            lst.append('ö')
        else if ch in vowel_set and lst and lst[-1] in vowel_set:
            lst[-1] = 'a' # "replaced by one vowel character", not sure what you want
            lst.append(':')
        else
            lst.append(ch)
    return "".join(lst)

print fix_the_string("natarook")

EDIT: Now that I saw the answer by @Anon. I think that's the simplest approach. This might actually be faster once you get a whole bunch of rules in play, as it makes one pass over the string; but maybe not, because the regexp stuff in Python is fast C code.

But simpler is better. Here is actual Python code for the regexp approach:

import re
pat_na = re.compile(r'na')
pat_double_vowel = re.compile(r'([aeiou])[aeiou]')

def fix_the_string(s):
    s = re.sub(pat_na, r'nö', s)
    s = re.sub(pat_double_vowel, r'\1:', s)
    return s

print fix_the_string("natarook") # prints "nötaro:k"


It might be simpler to do with a handmade list of regular expressions, rather than progmatically gererating them. I recommend the following code.

import re
# regsubs is a dictionary of regular expressions as keys, 
# and the replacement regexps as values
regsubs = {'na':u'nö',
           '([aeiou])\\1': '\\1:'}

def makesubs(s):
    for pattern, repl in regsubs.iteritems():
        s = re.sub(pattern, repl, s)
    return s

print makesubs('natarook')
# prints: nötaro:k
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜