开发者

regular expression to select after n number of occurences of certain character

Creating some regex expressions here. I was wondering if I could get some pointers on how to go about selecting a string after n occurences of one character and before the next occurence of a certain char.

for instance

xyz|yui|i want to select this.

In this example I am wanting to select after the 2nd "|" and before the next ".". So the text I want to match is "i want to select this".

I appreciate any pointers thanks.

UPDATE

To be more specific on why I need to do this above, there is more text after the period at the end of "I want to select this.". Basically this is undelimited content which I am trying to delimit. Thusfar I have been able to delimt th开发者_运维知识库e first two fields, now I need to be able to select only text after the last "|" and before the next period and add a "|" character to the end. So the desired result would be

xyz|yui|i want to select this.|

Sorry for not being more specific on the outcome and I hope this clears it up a bit. Thanks for the info, its super.


Your regex would look like this:

/^(?:.+?\|){2}(.+?[^(Co)]\.)/

PHP

<?php
    preg_match('/^(?:.+?\|){2}(.+?[^(Co)]\.)/','xyz|yui|This is a Co. sentence. Ending before this clause.',$out);
    echo $out[1];
?>

HOWEVER

You should explode by the pipe character and access the respective information like that:

$stuff = explode('|','xyz|yui|i want to select this.');
echo $stuff[2];


First you need to create a group which contains the repeating part ([^|]+\|) here, which can be set to appear exactly two times {2}, then you need to match the rest (.*):

^([^|]+\|){2}(.*?)\.

Update

You can ungroup it as @Karolis mentioned with ?:

^(?:[^|]+\|){2}(.*?)\.

WIth the first regexp the second match will be yours, with the second it will be the first.


This will do it:

$text = 'xyz|yui|i want to select Co. this. But not this.';
$re = '/# Match stuff after nth occurance of a char.
    ^               # Anchor to start of string.
    (?:[^|]*\|){2}  # Everything up through 2nd "|"
    (               # $1: Capture up through next "."
      [^.]*         # Zero or more non-dot.
      (?:           # Allow dot if in "Co.".
        (?<=Co)     # If dot is preceded by "Co",
        \.          # then allow this dot.
        [^.]*       # Zero or more non-dot.
      )*            # Zero or more "Co." dots allowed.
      \.            # First dot that is not "Co."
    )               # End $1: Capture up through next "."
    /ix';
$text = preg_replace($re, '$0|', $text);
echo $text;

Edit 2011-09-28 10:00 MDT: Added ability to skip over dots in: "Co."
Edit 2011-09-28 10:30 MDT: Changed to use preg_replace() to insert | after dot.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜