regular expression tog et values of html tag attributes
<li class="zk_list_c2 f_l"><a title="abc" target="_blank" href="link">
abc
</a> </li>
how would i extract abc and link?
$pattern="/<li class=\"zk_list_c2 f_l\"><a title=\"(.*)\" target=\"_blank\" href=\"(.*)\">\s*(.*)\s*<\/a> <\/li>/m";
preg_match_all($pattern, $content, $matches);
the one 开发者_运维百科i have right now doesnt seems to work
Considering your are trying to extract some data from an HTML string, regex are generally not the right/best tool for the job.
Instead, why no use a DOM parser, like the DOMDocument
class, provided with PHP, and its DOMDocument::loadHTML
method ?
Then, you could navigate through your HTML document using DOM methods -- which is much easier than using regex, especially considering than HTML is not quite regular.
Here, for example, you could use something like this :
$html = <<<HTML
<li class="zk_list_c2 f_l"><a title="abc" target="_blank" href="link">
abc
</a> </li>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$as = $dom->getElementsByTagName('a');
foreach ($as as $a) {
var_dump($a->getAttribute('href'));
var_dump(trim($a->nodeValue));
}
And you would get the following output :
string(4) "link"
string(3) "abc"
The code is not quite hard, I'd say, but, in a few words, here what it's doing :
- Load the HTML string :
DOMDocument::loadHTML
- Extract all
<a>
tags :DOMDocument::getElementsByTagName
- Foreach tag found :
- get the
href
attribute :DOMElement::getAttribute
- and the value of the node :
DOMNode::$nodeValue
- get the
Just a note : you might want to check if the href
attribute exists, with DOMElement::hasAttribute
, before trying to use its value...
EDIT after the comments : here's a quick example using DOMXpath to get to the links ; I supposed you want the link that's inside the <li>
tag with class="zk_list_c2 f_l"
:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$as = $xpath->query('//li[@class="zk_list_c2 f_l"]/a');
foreach ($as as $a) {
var_dump($a->getAttribute('href'));
var_dump(trim($a->nodeValue));
}
And, again, you get :
string(4) "link"
string(3) "abc"
As you can see, the only thing that changes is the way you're using to get to the right <a>
tag : instead of DOMDocument::getElementsByTagName
, it's just a matter of :
- instanciating The DOMXPath class
- and calling
DOMXPath::query
with the right XPath query.
精彩评论