开发者

Regex find the first word

I'm trying to use regex to add a span to the first word of content for a page, however the content contains HTML so I am trying to ensure just a word gets chosen. The content changes for every page.

Current script is:

preg_match('/(<(.*?)>)*/i',$page_content,$matches);
$stripped = substr($page_content,strlen($matches[0]));
preg_match('/\b[a-z]* \b/i',$stripped,$strippedmatch);
echo substr($page_content, 0, strlen($matches[0])).'<span class="h1">'.$strippedmatch[0].'</span>'.substr($stripped, strlen($strippedmatch[0]));

However if the $page_content is <p><span class="title">This is </span> my t开发者_StackOverflow社区itle!</p> Then my regex thinks the first word is "span" and adds the tags around that.

Is there any way to fix this? (or a better way to do it).


This seems to work...

(?<=\>)\b\w*\b|^\w*\b

If you wanna allow spaces in front also (remember to trim the resulting string):

(?<=>)\s*\b\w*\b|^\s*\w*\b


If i understand you correct you want a tag around the first word (none tag) with regex you could get that by using this regex

$code = preg_replace('/^(<.+?>\s*)+?(\w+)/i', '\1<span class="h1">\2</span>', $code);

this one just loops over the tags and waits until it finds text outside the tags


You shouldn't be using regex for this, but if you insist, you can try something like this:

<?php

$texts = array(
  '<p><span class="title">This is </span> my title!</p>',
  '<1>   <2>   <3>   blah   blah   <4> <5> blah',
  'garbage <1> <2> real stuff begins <3> <4>',
);

foreach ($texts as $text) {
  print preg_replace('/(>\s*)(\w+)/', '\1{{\2}}', $text, 1)."\n";
}

?>

This prints:

<p><span class="title">{{This}} is </span> my title!</p>
<1>   <2>   <3>   {{blah}}   blah   <4> <5> blah
garbage <1> <2> {{real}} stuff begins <3> <4>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜