RegEx a RegEx match

2023-01-15 21:47 问答作者：

I'm having trouble building the correct regex for my string. What I want to do is get all entities from my string; they start and end with '. The entities are identifiable by an amount of numbers and a # in front. However, entities (in this case a phone number starting with #) that don't start or end with ' should not be m开发者_如何学Pythonatched at all.

I hope someone can help me, or at least tell me that what I want to do isn't possible in one regex. Thanks :)

String:

'Blaa lablalbl balbla balb lbal '#39'blaaaaaaaa'#39' ('#39#226#8218#172#39') blaaaaaaaa #7478347878347834 blaaaa blaaaa'

RegEx:

'[#[0-9]+]*'

Wanted matches:

'#39'
'#39'
'#39'
'#226'
'#8218'
'#172'
'#39'

Found matches:

'#39'
'#39'
'#39#226#8218#172#39' <- Needs to be split(if possible in the same RegEx)

Another RegEx:

#[0-9]+

Found matches:

'#39'
'#39'
'#39'
'#226'
'#8218'
'#172'
'#39'
'#7478347878347834' <- Should not be here :(

Language: C# .NET (4.0)

You cannot do this in one regex, you'll need two:

First take all matches that are between single quotes:

'[\d#]+'

Then over all those matches, do this:

#\d+

So you'll end up with something like (in C#):

foreach(var m in Regex.Matches(inputString, @"'[\d#]+'"))
{
    foreach(var m2 in Regex.Matches(m.Value, @"#\d+"))
    {
          yield return m2.Value;
    }
}

Assuming you can use lookbehind/lookaheads and that your regexp supports variable length lookbehinds (JGSoft / .NET only)

(?<='[#0-9]*)#\d+(?=[#0-9]*')

Should work... Tested it using this site and got these results:

Breaking it down is pretty simple:

(?<=        # Start positive lookbehind group - assure that the text before the cursor
            # matches the following pattern: 
  '         # Match the literal '
  [#0-9]*   # Matches #, 0-9, zero or more times
)           # End lookbehind...
#\d+        # Match literal #, followed by one or more digits
(?=         # Start lookahead -- Ensures text after cursor matches (without advancing)
  [#0-9]*   # Allow #, 0-9, zero or more times
  '         # Match a literal '
)

So, this pattern will match #\d+ if the text before it is '[#0-9]* and the text after is [#0-9]*'

As you don't specify a language, here is a solution in perl :

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $s = qq!Blaa lablalbl balbla balb lbal '#39'blaaaaaaaa'#39' ('#39#226#8218#172#39') blaaaaaaaa #7478347878347834 blaaaa blaaaa!;

my @n = $s =~ /(?<=['#\d])(#\d+)(?=[#'\d])/g;

print Dumper(\@n);

Output :

$VAR1 = [
          '#39',
          '#39',
          '#39',
          '#226',
          '#8218',
          '#172',
          '#39'
        ];

继续阅读：.net c#-4.0 regex

RegEx a RegEx match

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？