开发者

Anything wrong with this RegEx?

I'm using a RegEx on an XML dump of a Wikipedia article.

The Regex is = {{[a-zA-Z0-9_\(\)\|\?\s\-\,\/\=\[\]\:.]+}}

I want to detect all the text wrapped with {{ and }}. But instead of detecting 56 matched which I got from simple search with {{, it only detects 45.

a sample block it doesn't detect is, {{cite journal | last = Heeks | first = Richard | year = 2008 | title = Meet Marty Cooper - the inventor of the mobile phone | journal = BBC | volume = 41 | issue = 6 | url = http://news.bbc.co.uk/2/hi/programmes/click_online/8639590.stm | pages = 26–33 | doi = 10.1109/MC.2008.192 }} ..

but it detects, {{cite web | title = Of Cigarettes and Cellphones | last = Ulyseas | first = Mark | date = 2008-01-18 | url = http://www.thebalitimes.com/2008/01/18/of-cigarettes-and-cellphones/ | publisher = The Bali Times | acc开发者_JAVA技巧essdate = 2008-02-24 }}

can anyone please detect me the problem?


Some of the escaping is superfluous, but I don't think that's the real problem.

I recommend trying \w instead of a-zA-Z0-9_, especially because in .NET regex \w also recognizes Unicode letter (unless it's in ECMAScript compliant mode).

Another alternative is that if the text part can not contain } (which right now it can't anyway), you can also use simply {{[^}]+}}.

The [^...] is a negated character class. [^}] matches anything but }.

References

  • regular-expressions.info/Character Class

Related questions

  • .Net regex: what is the word character \w?


Your character class is...special. For starters, everything you're matching is covered by the . at the end. Also, curly braces ({}) are special characters, so they should be escaped. Finally, you'll want to force it not to be greedy by adding a ? after that +, otherwise it will match curly braces.

EDIT: I won't try to go back on what I said, but I would like to note that I was mistaken about pretty much everything in this post (other than that braces should be escaped, which is just a matter of good practice).


The regex {{(.*?)}} works well for me in perl. It catches everything in between 2 nested braces.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜