Remove letter duplicates that are in a row

2023-03-20 21:07 问答作者：

Looking for a fast way to limit duplicates to a max of 2 when they occur next to each other.

For example: jeeeeeeeep => ['jep','jeep']

Looking for suggestions in python but happy to see an example in anything - not difficult to switch.

Thanks for any assistance!

EDIT: ~~English doesn't have any (or many) consonants (same letter) in a row right? Lets limit this so no duplicate consonants in a row and up to two vowels in a row~~

EDIT2: I'm silly (hey开发者_如何转开发 that word has two consonants), just checking all letters, limiting duplicate letters that are next to each other to two.

Here's a recursive solution using groupby. I've left it up to you which characters you want to be able to repeat (defaults to vowels only though):

from itertools import groupby

def find_dub_strs(mystring):
    grp = groupby(mystring)
    seq = [(k, len(list(g)) >= 2) for k, g in grp]
    allowed = ('aeioupt')
    return rec_dubz('', seq, allowed=allowed)

def rec_dubz(prev, seq, allowed='aeiou'):
    if not seq:
        return [prev]
    solutions = rec_dubz(prev + seq[0][0], seq[1:], allowed=allowed)
    if seq[0][0] in allowed and seq[0][1]:
        solutions += rec_dubz(prev + seq[0][0] * 2, seq[1:], allowed=allowed)
    return solutions

This is really just a heuristically pruned depth-first search into your "solution space" of possible words. The heuristic is that we only allow a single repeat at a time, and only if it is a valid repeatable letter. You should end up with 2**n words at the end, where n is he number times an "allowed" character was repeated in your string.

>>> find_dub_strs('jeeeeeep')
['jep', 'jeep']
>>> find_dub_strs('jeeeeeeppp')
['jep', 'jepp', 'jeep', 'jeepp']
>>> find_dub_strs('jeeeeeeppphhhht')
['jepht', 'jeppht', 'jeepht', 'jeeppht']

use a regular expression:

>>> import re
>>> re.sub(r'(.)\1\1+', r'\1\1', 'jeeeep')
'jeep'

The solution for a single character using groupby:

>>> from itertools import groupby
>>> s = 'jeeeeeeeep'
>>> ''.join(c for c, unused in groupby(s))
'jep'

And the one for maximum of two characters:

''.join(''.join(list(group)[:2]) for unused, group in groupby(s))

Here is a Sh+Perl solution, I'm afraid I don't know Python:

echo jjjjeeeeeeeeppppp | perl -ne 's/(.)\1+/\1\1/g; print $_;'

The key is the regex that finds (.)\1+ and replaces it by \1\1, globally.

Use regular expressions along with a key press event!

继续阅读：pattern-matching python spell-checking string

Remove letter duplicates that are in a row

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？