开发者

Select Text After Pattern

I'm trying to select all text in-between following a specific pattern:

Sample Text:

"by thatonekid (Posted Mon Jan 12, 2009 7:17 pm)
fell onto the 开发者_JAVA技巧trail right below one of the most traveled walls at the point! yikes!

"

Every text I work on will start with: "by USERNAME (Posted DATE) <br /> theTextIWant"

I thought about exploding on the paren's, but obviously, that could break up the text if there's another paren.

Secondly, some of the texts end in "<br /><br />". I need to remove the trailing <br />'s if there is no text afterwards.

I apologize if this looks like I'm asking for someone to do my homework -- I honestly have no idea where to begin here


If you only want the text after the username/date, you can simply remove everything before the first <br />, assuming your formatting is consistent.

$text = preg_replace("/^.*?<br(\s\/)?>/si", "", $string);

This would replace everything before and including the first <br> or <br />, case-insensitive, with an empty string, leaving you with just the text. The .*? at the beginning is a non-greedy match, meaning it will capture as little as possible. In other words, it won't grab past the first break.

You can then follow this with:

$text = preg_replace("/^.*?<br(?:\s\/)?>(.*?)(<br(\s\/)?>)+$/si", "$1", $string);

This should remove all ending whitespace and <br>/<br /> tags.

You could also do all of this with a single preg_replace:

$text = preg_replace("/.*?<br(?:\s\/)?>(.*)(?:<br(?:\s\/)?>\s*)+$/si", "$1", $string);

I made all of the () captures (?:) non-captures, except the one containing the text.

(I don't use php regularly, so I am assuming that a perl compatible regex is what it says it is).


For instance, you could try thoses regexp, with preg_match I guess. see online doc.

username : [_a-zA-Z09]+
date: [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}

(sorry gtg, helping you more later if unsolved)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜