Trying to use regex matches between words using PHP

2023-01-09 04:28 问答作者：

I am trying to match HTML tags that might occur between words on a web page, using regex's.

For example, if the sentence that I want to match is "This is a word", I need to develop a pattern that will match something like "This is a word".

I've tried 开发者_开发技巧using the code below to prepare the regex pattern:

$pattern = "/".str_replace(" ", .{0,100}, $sentence)."/si";

This replaces all spaces by .{0,100} and uses the s modifier to match any character. However, I am getting undesired results with this.

Thanks in advance for any help with this!

Try to use ereg_replace() or preg_replace() function when you are trying to perform a regular expression search and replace.

I put this together very quickly, so it probably doesn't cover all edge cases, but I think it at least partially matches your requirements. Also, I haven't tried it in PHP.

/[^\s>]+[\s]*(<([^>]+)>)(.*)(</\2>)[\s]*[^\s<]+/g

In the following example:

<p>This is a <b><i>nice</i> sentence</b>.</p> <p>Here's another sentence.</p>

It only matches the first sentence, in the following groups:


b
nice sentence
b

What are you actually trying to achieve? Parsing an html document with regex might not be the best solution. You can use XPath for what you've described (so far).
E.g. finding all rows in a table that contain the text this is a word:

<?php
$doc = new DOMDocument;
$doc->loadhtml('<html><head><title>...</title></head><body>
  <table>
    <tr><td>1</td><td>lalala</td></tr>
    <tr><td>2</td><td>this is a <b>word</b></td></tr>
    <tr><td>3</td><td>lalala</td></tr>
    <tr><td>4</td><td><b>And this is a</b> word, too</td></tr>
  </table>
</body></html>');

$xpath = new DOMXPath($doc);
foreach($xpath->query('/html/body/table/tr[./td[contains(., "this is a word")]]') as $tr) {
  foreach($tr->childNodes as $td) {
    echo $td->nodeValue, ' ';
  }
  echo "\n";
}

prints

2 this is a word 
4 And this is a word, too

The regular expression

%(<[^>]+?>)\s*?((?:\w+\s*)*)\s*?(</[^>]+?>)%im

will grab basic words, including simple multiple word phrases that are between a proper opening and closing tag, and group the full match, the opening tag, the word/phrase and the closing tag so you can access each easily.

So lets say your input will be html source code. Then run preg_match_all with the PREG_SET_ORDER flag. This will return an array of matches arrays, useful for looping through with foreach().

In this function below, $html is your source page that you want to search, and $matches is an empty array passed by value that the function will fill in with your results for you.

<?php
$html='
This is a <b>word</b>.
This is not a word.
This is a <span>three word phrase</span>.
';

$regex ='%(<[^>]+?>)\s*?((?:\w+\s*)*)\s*?(</[^>]+?>)%im';

preg_match_all($regex, $html, $matches, PREG_SET_ORDER);

foreach($matches as $val) {
    //$val[0] will always be the entire match with the tags
    echo "full match: " . $val[0] . "\n";

    //$val[1] will always be the opening tag
    echo "opening tag: " . $val[1] . "\n";

    //$val[2] will always be the word or words, if separated by spaces
    echo "word: " . $val[2] . "\n";

    //$val[3] will always be the closing tag
    echo "closing tag: " . $val[3] . "\n\n";
}
?>

The above script will output:

full match: <b>word</b>
opening tag: <b>
word: word
closing tag: </b>

full match: <span>three word phrase</span>
opening tag: <span>
word: three word phrase
closing tag: </span>

继续阅读：html-parsing php regex

Trying to use regex matches between words using PHP

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？