Find and replace all links in a web page using php/javascript
I need to find links in a part of some html code and replace all the links with two different absolute or base domains followed by the link on the page...
I have found a lot of ideas and tried a lot different solutions.. Luck aint on my side on this one.. Please help me out!! Thank you!!
This is my code:
<?php
$url = "http://www.oxfordreference.com/views/SEARCH_RESULTS.html?&q=android";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,'<table class="short_results_summary_table">');
$end = strpos($c开发者_如何学JAVAontent,'</table>',$start) + 8;
$table = substr($content,$start,$end-$start);
echo "{$table}";
$dom = new DOMDocument();
$dom->loadHTML($table);
$dom->strictErrorChecking = FALSE;
// Get all the links
$links = $dom->getElementsByTagName("a");
foreach($links as $link) {
$href = $link->getAttribute("href");
echo "{$href}";
if (strpos("http://oxfordreference.com", $href) == -1) {
if (strpos("/views/", $href) == -1) {
$ref = "http://oxfordreference.com/views/"+$href;
}
else
$ref = "http://oxfordreference.com"+$href;
$link->setAttribute("href", $ref);
echo "{$link->getAttribute("href")}";
}
}
$table12 = $dom->saveHTML;
preg_match_all("|<tr(.*)</tr>|U",$table12,$rows);
echo "{$rows[0]}";
foreach ($rows[0] as $row){
if ((strpos($row,'<th')===false)){
preg_match_all("|<td(.*)</td>|U",$row,$cells);
echo "{$cells}";
}
}
?>
When i run this code i get htmlParseEntityRef: expecting ';' warning for the line where i load the html
var links = document.getElementsByTagName("a");
will get you all the links.
And this will loop through them:
for(var i = 0; i < links.length; i++)
{
links[i].href = "newURLHERE";
}
You should use jQuery - it is excellent for link replacement. Rather than explaining it here. Please look at this answer.
How to change the href for a hyperlink using jQuery
I recommend scrappedcola's answer, but if you dont want to do it on client side you can use regex to replace:
ob_start();
//your HTML
//end of the page
$body=ob_get_clean();
preg_replace("/<a[^>]*href=(\"[^\"]*\")/", "NewURL", $body);
echo $body;
You can use referencing (\$1) or callback version to modify output as you like.
精彩评论