开发者

PHP Regex running well - now i need some tailoring

Good evening dear community,

i need some help with preg_match - i want to optimize the code that allready runs very well! i want to get ony the results - not the overhead of HTML-tags in the result That means i have to tailor the regex a bit. How can i improve the (allready very nice) code!?

<?php

$content = file_get_contents("< - URL - >");

var_dump($content);

$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);

foreach ($matches[1] as $match) {
    $match = strip_tags($match);
    $match = trim($match);
    var_dump($match);
}

?>

See here the url: link text

Hmm - i need to tailor the regex a bit... Cany anybody give me.

Each idea and tipp will be grea开发者_开发知识库tly appreciated regards zero


It appears that you are trying to scrape data from HTML pages. If this is the case, then you really should not use regular expressions to extract information. Take a look instead at the DOMDocument class.

Note that DOMDocument requires XML input, so often a "tidying" process needs to prepare the HTML for being parsed as XML. One convenient way to do this is to use the "tidy" extension. See "Tidying up your HTML with PHP 5" for an introduction to its use.

EDIT: How can I scrape a website with invalid HTML

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜