Regex in Python to find words that follow pattern: vowel, consonant, vowel, consonant

2023-03-07 22:17 问答作者：

Trying to learn Regex in Python to find words that have consecutive vowel-开发者_运维技巧consonant or consonant-vowel combinations. How would I do this in regex? If it can't be done in Regex, is there an efficient way of doing this in Python?

I believe you should be able to use a regular expression like this:

r"([aeiou][bcdfghjklmnpqrstvwxz])+"

for matching vowel followed by consonant and:

r"([bcdfghjklmnpqrstvwxz][aeiou])+"

for matching consonant followed by vowel. For reference, the + means it will match the largest repetition of this pattern that it can find. For example, applying the first pattern to "ababab" would return the whole string, rather than single occurences of "ab".

If you want to match one or more vowels followed by one or more consonants it might look like this:

r"([aeiou]+[bcdfghjklmnpqrstvwxz]+)+"

Hope this helps.

^(([aeiou][^aeiou])+|([^aeiou][aeiou])+)$

>>> import re
>>> consec_re = re.compile(r'^(([aeiou][^aeiou])+|([^aeiou][aeiou])+)$')
>>> consec_re.match('bale')
<_sre.SRE_Match object at 0x01DBD1D0>
>>> consec_re.match('bail')
>>>

If you map consonantal digraphs into single consonants, the longest such word is anatomicopathological as a 10*VC string.

If you correctly map y, then you get complete strings like acetylacetonates as 8*VC and hypocotyledonary as 8*CV.

If you don’t need the string to be whole, you get a 9*CV pattern in chemicomineralogical and a 9*VC pattern in overimaginativeness.

There are many 10* words if runs of consecutive consonants or vowels are allowed to alternate, as in (C+V+)+. These include laparocolpohysterotomy and ureterocystanastomosis.

The main trick is to first map all consonants to C and all vowels to V, then do a VC or CV match. For Y, you have to do lookaheads and/or lookbehinds to determine whether it maps to C or V in that position.

I could show you the patterns I used, but you probably won’t be pleased with me. :) For example:

 (?<= \p{IsVowel} )     [yY] (?= \p{IsVowel} )  # counts as a C
 (?<= \p{IsConsonant} ) [yY]                    # counts as a V
                        [yY] (?= \p{IsVowel} )  # counts as a C

The main trick then becomes one of looking for overlapping matches of VC or CV alternations via

 (?= ( (?:  \p{IsVowel}       \p{IsConsonant} )  )+ ) )

and

 (?= ( (?:  \p{IsConsonant}   \p{IsVowel}     )  )+ ) )

Then you count all those up and see which ones are the longest.

However, since Python support doesn’t (by default/directly) support properties in regexes the way I just used them for my own program, this makes it even more important to first preprocess the string into nothing but C’s and V’s. Otherwise your patterns look really ugly.

继续阅读：python regex

Regex in Python to find words that follow pattern: vowel, consonant, vowel, consonant

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？