Please help clarify my regex pattern
I have the following string:
<script>m('02:29:1467301/>Sender1*>some text message?<<02:29:13625N1/>Sender2*>Recipient2: another message??<>A<<02:29:1393100=>User1*|0User2*|%></B><<','');</script>
N.B. messages are separated by <<
I need extract from message the following parts:
1. Time 2. Sender 3. Recipient 4. TextRecipient may being defined or not, this field is optional.
I do this by the following pattern:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(.+?)))<<
But, I cannot extract recipient separately from the message text.
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<开发者_如何学Python;messageData>(?<sender>.+?)\*>(((?<recipient>.+?):){0,1}(?<messageText>.+?))))<<
N.B. In the first message no recipient
Please help correct my pattern.
The <recipient> group pattern needs to exclude < and : or else it will match the text between *> and the timestamp's first colon when the recipient is omitted (as in the first message of your example).
A simple tweak to that group pattern should fix it:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(((?<recipient>[^<:]+):)?(?<messageText>.+?))))<<
Note I replaced {0,1} with the optional quantifier (?). It's just shorthand to improve readability (a little goes a long way). :-)
Speaking of readability, here it is in multi-line form:
(?<message>
    (?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>
    (?<messageData>
        (?<sender>.+?)\*>
        (
          ((?<recipient>[^<:]+):)?
          (?<messageText>.+?)
        )
    )
)<<
I don't know if the unnamed group containing <recipient> and <messageText> was intentional, but it's unnecessary. You can break it down to this:
(?<message>
    (?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>
    (?<messageData>
        (?<sender>.+?)\*>
        ((?<recipient>[^<:]+):)?
        (?<messageText>.+?)
    )
)<<
Check this out, may fit little better:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]*).+?>(?<messageData>(?<sender>.*?)>(((?<recipient>[^<:]+):)?(?<messageText>.*?))))<<
P.S. Hi there ;)
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论