regular expression to select after n number of occurences of certain character
Creating some regex expressions here. I was wondering if I could get some pointers on how to go about selecting a string after n occurences of one character and before the next occurence of a certain char.
for instance
xyz|yui|i want to select this.
In this example I am wanting to select after the 2nd "|" and before the next ".". So the text I want to match is "i want to select this".
I appreciate any pointers thanks.
UPDATE
To be more specific on why I need to do this above, there is more text after the period at the end of "I want to select this.". Basically this is undelimited content which I am trying to delimit. Thusfar I have been able to delimt th开发者_运维知识库e first two fields, now I need to be able to select only text after the last "|" and before the next period and add a "|" character to the end. So the desired result would be
xyz|yui|i want to select this.|
Sorry for not being more specific on the outcome and I hope this clears it up a bit. Thanks for the info, its super.
Your regex would look like this:
/^(?:.+?\|){2}(.+?[^(Co)]\.)/
PHP
<?php
preg_match('/^(?:.+?\|){2}(.+?[^(Co)]\.)/','xyz|yui|This is a Co. sentence. Ending before this clause.',$out);
echo $out[1];
?>
HOWEVER
You should explode by the pipe character and access the respective information like that:
$stuff = explode('|','xyz|yui|i want to select this.');
echo $stuff[2];
First you need to create a group which contains the repeating part ([^|]+\|)
here, which can be set to appear exactly two times {2}
, then you need to match the rest (.*)
:
^([^|]+\|){2}(.*?)\.
Update
You can ungroup it as @Karolis mentioned with ?:
^(?:[^|]+\|){2}(.*?)\.
WIth the first regexp the second match will be yours, with the second it will be the first.
This will do it:
$text = 'xyz|yui|i want to select Co. this. But not this.';
$re = '/# Match stuff after nth occurance of a char.
^ # Anchor to start of string.
(?:[^|]*\|){2} # Everything up through 2nd "|"
( # $1: Capture up through next "."
[^.]* # Zero or more non-dot.
(?: # Allow dot if in "Co.".
(?<=Co) # If dot is preceded by "Co",
\. # then allow this dot.
[^.]* # Zero or more non-dot.
)* # Zero or more "Co." dots allowed.
\. # First dot that is not "Co."
) # End $1: Capture up through next "."
/ix';
$text = preg_replace($re, '$0|', $text);
echo $text;
Edit 2011-09-28 10:00 MDT: Added ability to skip over dots in: "Co."
Edit 2011-09-28 10:30 MDT: Changed to use preg_replace()
to insert | after dot.
精彩评论