开发者

How to make all src strings global in PHP?

I am writing a web browser in PHP, for devices (such as the Kindle) which do not support multi-tab browsing. Currently I am reading the page source with file_get_contents(), and then echoing it into the page. My problem is that many pages use local references (such a开发者_如何学Pythons < img src='image.png>'), so they all point to pages that don't exist. What I want to do is locate all src and href tags and prepend the full web address to any that do not start with "http://" or "https://". How would I do this?


add <base href="http://example.com/" />

at the head of the page

this will help you insert it to the <head></head> section


Like elibyy suggested, I too would recommend using the base tag. Here's a way to do it with PHP's native DOMDocument:

// example url
$url = 'http://example.com';
$doc = new DOMDocument();
$doc->loadHTMLFile( $url );

// first let's find out if there a base tag already
$baseElements = $doc->getElementsByTagName( 'base' );

// if so, skip this block
if( $baseElements->length < 1 )
{
    // no base tag found? let's create one
    $baseElement = $doc->createElement( 'base' );
    $baseElement->setAttribute( 'href', $url );
    $headElement = $doc->getElementsByTagName( 'head' )->item( 0 );
    $headElement->appendChild( $baseElement );
}

echo $doc->saveHTML();

Having said this however; are you sure you are aware of how ambitious your goal is?

For instance, I don't think this is exactly what you really need at all, as your application is basically acting as a proxy. Therefor you will probably want to route, at least, all user-clickable links through your application, and not route them directly to the original links at all, because I presume you want to keep the user in your tabbed application, and not break out of it.

Something like:

http://yourapplication.com/resource.php?resource=http://example.com/some/path/

Now, this could of course be achieved by basically doing what you requested, and in stead of prepending it with either, http:// or https:// prepend with something such that it results in above example url.

However, how are you gonna discern what resources to do this with, and what resources not? If you take this approach for all resources in the page, your application will quickly become a full fletched proxy, thereby becoming very resource intensive.

Hopefully I've given you a brief starter for some things to take into consideration.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜