Regular expression to ignore a certain number of character repetitions

2022-12-21 05:42 问答作者：

I'm trying to write a parser that uses two characters as token boundaries, but I can't figure out the regular expression that will allow me to ignore them when I'm regex-escaping the whole string.

Given a string like:

This | is || token || some ||| text

I would like to end up with:

This \| is || token || some \|\|\| text

where all of the | are escaped unless there开发者_开发知识库 are two of them together.

Is there a regular expression that will allow me to escape every | that isn't in a pair?

No need regex. You are using Python after all. :)

>>> s="This | is || token || some ||| text"
>>> items=s.split()
>>> items
['This', '|', 'is', '||', 'token', '||', 'some', '|||', 'text']
>>> for n,i in enumerate(items):
...     if "|" in i and i.count("|")!=2:
...          items[n]=i.replace("|","\|")
...
>>> print ' '.join(items)
This \| is || token || some \|\|\| text

I don't see why you would need to regex-escape the tokens, but why not split up the string first and then escape them? This regex splits on two pipes that aren't preceded or followed by more pipes:

re.split('(?<!\|)\|\|(?!\|)', 'This | is || token || some ||| text')
>>> ['This | is ', ' token ', ' some ||| text']

By the way, there are testers for all of the more common regex flavors out there for the Googling. Here's one for Python: http://re.dabase.com/

Here's a way to do it with regular expressions in perl, if anyone's interested. I used two separate regular expressions, one for the single match and one for the 3 or more match. I'm sure it's possible to combine them, but regular expressions are already difficult enough to read without adding needless complexity.

#!/usr/bin/perl

#$s = "This | is || token || some ||| text";
$s = "| This |||| is || more | evil |";

$s =~ s/([^|]|^)(\|)([^|]|$)/\1\\\2\3/g;
$s =~ s{(\|{3,})}
{
   $a = $1;
   $a =~ s{\|} {\\\|}g;
   $a;
}eg;

print $s . "\n";

Outputs:

\| This \|\|\|\| is || more \| evil \|

继续阅读：python regex regex-negation

Regular expression to ignore a certain number of character repetitions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？