regex to find instance of a word or phrase -- except if that word or phrase is in braces

2022-12-18 17:09 问答作者：

First, a disclaimer. I know a little about regex's but I'm no expert. They seem to be something that I really need twice a year so they just don't stay "on top" of my brain.

The situation: I'd like to write a regex to match a certain word, let's call it "Ostrich". Easy. Except Ostrich can sometimes appear inside of开发者_JS百科 a curly brace. If it's inside of a curly brace it's not a match. The trick here is that there can be spaces inside the curly braces. Also the text is typically inside of a paragraph.

This should match: I have an Ostrich.

This should not match: My Emu went to the {Ostrich Race Name}.

This should be a match: My Ostrich went to the {Ostrich Race Name}.

This should not be a match: My Emu went to the {Race Ostrich Place}. My Emu went to the {Race Place Ostrich}.

It seems like this is possible with a regex, but I sure don't see it.

I'll offer an alternative solution to doing this, which is a bit more robust (not using regex assertions).

First, remove all the bracketed items, using a regex like {[^}]+} (use replace to change it to an empty string).

Now you can just search for Ostrich (using regex or simple string matching, depending on your needs).

While regular expressions can certainly be written to do what you ask, they're probably not the best tool for this particular type of thing.

One major problem with regular expressions is that they're very good at pattern matching for things that are there, but not so much when you start adding except into the mix.

Regular expressions are not stateful enough to handle this properly without a lot of work, so I would try to find a different path towards a solution.

A character tokenizer that handles the braces would be easy enough to write.

I believe this will work, using lookahead and lookbehind assertions:

(?<!{[^}]*)Ostrich(?![^{]*})

I also tested the case My {Ostrich} went to the Ostrich Race. (where the second "Ostrich" does match)

Note that the lookahead assertion: (?![^{]*}) is optional.. but without it:

My {Ostrich has a missing bracket won't match
My Ostrich also} has a missing bracket will match

which may or may not be desirable.

This works in the .NET regex engine, however, it is not PCRE-compatible because it uses non-fixed-length assertions which are not supported.

Here's a very large regex that almost works.

It will return each "raw" occurrence of the word in a group.
However, the group for the last one will be empty; I'm not sure why. (Tested with .Net)

Parse without whitespace

^(?:

    (?:
        [^{]
        |
        (?:\{.*?\})
    )*?

    (?:\W(Ostrich)\W)?
)*$

Using a positive lookahead with a negation appears to properly match all the test cases as well as multiple Ostriches:

(?<!{[^}]*)Ostrich(?=[^}]*)

继续阅读：regex

regex to find instance of a word or phrase -- except if that word or phrase is in braces

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？