Select Text After Pattern
I'm trying to select all text in-between following a specific pattern:
Sample Text:
"by thatonekid (Posted Mon Jan 12, 2009 7:17 pm)
fell onto the 开发者_JAVA技巧trail right below one of the most traveled walls at the point! yikes!
"
Every text I work on will start with: "by USERNAME (Posted DATE) <br />
theTextIWant"
I thought about exploding on the paren's, but obviously, that could break up the text if there's another paren.
Secondly, some of the texts end in "<br /><br />
". I need to remove the trailing <br />
's if there is no text afterwards.
I apologize if this looks like I'm asking for someone to do my homework -- I honestly have no idea where to begin here
If you only want the text after the username/date, you can simply remove everything before the first <br />, assuming your formatting is consistent.
$text = preg_replace("/^.*?<br(\s\/)?>/si", "", $string);
This would replace everything before and including the first <br> or <br />, case-insensitive, with an empty string, leaving you with just the text. The .*? at the beginning is a non-greedy match, meaning it will capture as little as possible. In other words, it won't grab past the first break.
You can then follow this with:
$text = preg_replace("/^.*?<br(?:\s\/)?>(.*?)(<br(\s\/)?>)+$/si", "$1", $string);
This should remove all ending whitespace and <br>/<br /> tags.
You could also do all of this with a single preg_replace:
$text = preg_replace("/.*?<br(?:\s\/)?>(.*)(?:<br(?:\s\/)?>\s*)+$/si", "$1", $string);
I made all of the () captures (?:) non-captures, except the one containing the text.
(I don't use php regularly, so I am assuming that a perl compatible regex is what it says it is).
For instance, you could try thoses regexp, with preg_match I guess. see online doc.
username : [_a-zA-Z09]+
date: [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}
(sorry gtg, helping you more later if unsolved)
精彩评论