Which regex expression do I need for this?
I'm kinda stuck here.
I have this pattern:<a class="title" href="showthread.php?t=XXXXX" id="thread_title_XXX">DATADATA</a>
I know that in my string (a webpage) all my data is stored in this format, while it has the 'unique 开发者_StackOverflow社区signature' I just wrote. the XXX's count is dynamic, probabaly somewhere between 2 to 12 DIGITS (each X is a digit).
I can write a long expression to find the whole line, but I want to extract the data, not the whole thing.
How can I do it ? An example would be appreciated.
Thank you!Forget about regular expressions, they're not meant to parse formats like HTML, especially if an actual parser exists for it already.
Find the nodes using XPath:
$html = <<<EOT
<html>
Some html
<a class="title" href="showthread.php?t=XXXXX" id="thread_title_XXX">DATADATA</a>
</html>
EOT;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[starts-with(@href, "showthread.php")]') as $node) {
// ...
}
Then extract the data using substr, strpos and parse_str:
$href = $node->getAttribute('href');
parse_str(substr($href, strpos($href, '?')+1), $query);
$t = $query['t'];
$id = $node->getAttribute('id');
$title = substr($id, strlen('thread_title_'));
$data = $node->nodeValue;
var_dump($t, $title, $data);
You get:
string(5) "XXXXX"
string(3) "XXX"
string(8) "DATADATA"
Try to use :
$parsed_str = '<a class="title" href="showthread.php?t=45343" id="thread_title_XXX">DATADATA</a><a class="title" href="showthread.php?t=466666" id="thread_title_XXX">DATADATA</a> fasdfasdfsdfasd gfgfkgbc 04034kgs <fdfd> dfs</fdfa> <a class="title" href="showthread.php?t=7777" id="thread_title_XXX">DATADATA</a>';
preg_match_all("/.*?\?t\=([\d]{2,12}).*?/", $parsed_str, $result);
print_r($result);
what actually you want to do ? Get the XXXXX signature or all links?
try this - this is get a signature and data
<?php
$S = '<a class="title" href="showthread.php?t=1234567" id="thread_title_XXX">DATADATA</a>';
$pattern = '!<a.*href="showthread.php\?t=(.*)".* id=".*">(.*)</a>!';
echo "<pre>";
print_r(preg_match($pattern, $S, $res));
print_r($res);
echo "</pre>";
?>
精彩评论