How to extract regex comment

2023-02-12 06:22 问答作者：

(?<!(\w/))$#Cannot end with a word and slash

I would like to extract the comment from the end. While the example does not reflect this case, there could be a regex with includes regex on hashes.

\##value must be a hash

What would the regex be to extract the comment ensuring it is safe when used against regex which could contain #'s that are not comments.

Here's a .Net flavored Regex for partly parsing .Net flavor patterns, which should get pretty close:

\A
(?>
    \\.         # Capture an escaped character
    |           # OR
    \[\^?       # a character class
        (?:\\.|[^\]])*    # which may also contain escaped characters
    \]
    |           # OR
    \(\?(?# inline comment!)\#      
        (?<Comment>[^)]*)
    \)
    |           # OR
    \#(?<Comment>.*$)   # a common comment!
    |           # OR
    [^\[\\#]    # capture any regular character - not # or [
)*
\z

Luckily, in .Net each capturing group remembers all of its captures, and not just the last, so we can find all captures of the Comment group in a single parse. The regex pretty much parses regular expression - but hardly fully, it just parses enough to find comments.
Here's how you use the result:

Match parsed = Regex.Match(pattern, pattern,
                           RegexOptions.IgnorePatternWhitespace | 
                           RegexOptions.Multiline);
if (parsed.Success)
{
    foreach (Capture capture in parsed.Groups["Comment"].Captures)
    {
        Console.WriteLine(capture.Value);
    }
}

Working example: http://ideone.com/YP3yt

One last word of caution - this regex assumes the whole pattern is in IgnorePatternWhitespace mode. When it isn't set, all # are matched literally. Keep in mind the flag might change multiple times in a single pattern. In (?-x)#(?x)#comment, for example, regardless of IgnorePatternWhitespace, the first # is matched literally, (?x) turns the IgnorePatternWhitespace flag back on, and the second # is ignored.

If you want a robust solution you can use a regex-language parser.
You can probably adapt the .Net source code and extract a parser:

Reference Source - RegexParser.cs
GitHub - RegexParser.cs

Something like this should work (if you run it separately on each line of the regex). The comment itself (if it exists) will be in the third capturing group.

/^((\\.)|[^\\\#])*\#(.*)/

(\\.) matches an escaped character, [^\#] matches any non-slash non-hash characters, together with the * quantifier they match the entire line before the comment. Then the rest of the regex detects the comment marker and extracts the text.

One of the overlooked options in regex parsing is the RightToLeft mode.

extract the comment from the end.

One can simply the pattern if we work our way from the end of the line to the beginning. Such as

^          
  .+?            # Workable regex 
 (?<Comment>     # Comment group
   (?<!\\)       # Not a comment if escaped.
   \#            # Anchor for actual comment
   [^#]+         # The actual commented text to stop at #
 )?              # We may not have a comment 
$

Use the above pattern in C# with these options RegexOptions.RightToLeft | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline

there could be a regex with includes regex on hashes

This line (?<!\\) # Not a comment if escaped. handles that situation by saying if there is a proceeding \, we do not have a comment.

继续阅读：regex

How to extract regex comment

更多精彩内容

精彩评论

最新问答

大家觉得三星电视怎么样?？

电动幕布挂不平会不会有皱纹？

海信激光电视视距是多少,客厅大小怎么匹配?？

如何打开屏幕镜像？

检查输卵管堵了哪家医院好？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？