regular expression breaking on new line

2023-01-01 16:58 问答作者：

I'm trying to use a regular expression as below:

preg_match_all('|<table.*</table>|',$html,$matches, PREG_SET_ORDER);

But this is not working, and I think the problem is the new line inside the string $html.

Could someone tell me a work around?

EDIT: I've realized that it'开发者_开发问答s not right to use regex to parse HTML. Thanks to those who told me. :)

The dot does not match newlines unless the s pattern modifier is used.

preg_match_all('|<table.*?</table>|s',$html,$matches, PREG_SET_ORDER);

(Be aware that using regex to parse HTML ranks among the worst capital sins here in SO).

Before making a decision on what to do next, I'd read this first: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

In general, it's not a good idea to parse HTMl with RegEx.

I recommend using DOM

You can check out the PHP Simple HTML DOM Parser as an alternative.

Main Features:

A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!

Require PHP 5+.

Supports invalid HTML.

Find tags on an HTML page with selectors just like jQuery.

Extract contents from HTML in a single line.

preg_match_all('|<table.*?</table>|ms',$html,$matches, PREG_SET_ORDER);

Did you try the multiline modifier m?

preg_match_all('|<table.*</table>|m',$html,$matches, PREG_SET_ORDER);

Use the /s flag to have the '.' also apply to new line characters, or just check for new line characters explicitly - usually '[\n\r]'. I haven't yet read it myself, but do check out more info on the PCRE library at http://www.pcre.org/pcre.txt

Careful how you form your pattern though - long input strings with newlines mixed with misunderstood patterns can cause unexplained script failures and connection resets.

In your case, PCRE functions don't seem to be needed here, and could cause unexpected results anyway. If you're just looking to extract contents of a single table on a page, why not just do the most basic...

$start = stripos($input, "<table>");
$end = stripos($input, "</table>", $start);
$my_table = substr($input, $start, $end);

EDIT: I've realized that it's not right to use regex to parse HTML.

Better: You can read $html into a SimpleXML object and parse it with SimpleXML's Xpath. (Powerful and much easier to use than the DOM extension IMHO.)

Like this:

$html = "<html><body><table id=\"mytbl\"><tr><td>ABC</td></tr><tr><td>DEF</td></tr></table></body></html>";

$xml = simplexml_load_string($html);

if($xml)
foreach($xml->xpath("/html/body/*") as $item) {
    echo $item["id"] . "<br>"; // mytbl
    foreach($item->tr as $tr) {
        echo $tr->td . "<br>"; // 1:ABC, 2:DEF
    }
}

继续阅读：pcre php regex

regular expression breaking on new line

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？