开发者

PHP 5.3 smart search and replace with regular expressions

OK, I need to scan many HTML / XHTML documents to see if a particular file has been embedded with SWFObject. If it's the case, I need to replace the call to something else.

So far I have extracted the <script> contents where the calls can be made. Now I need to scan this string to check if the call is there and if it's there I need to replace it.

I know this is a bit odd, but the content comes from a third party which we don't have control on.

Since the call can be made in many different syntax, I will need a regular expression to find and replace the calls.

OK imagine the following scenario:

I'm searching if the file test.swf is embedded with SWFObject in the file.

The <script> content look like this:

alert('test.swf');
//some other random stuff here
swfobject.embedSWF("test.swf",
"The alternative content can screw the regexp with );", "300", "120",
"9.0.0", false, flashvars, params, attributes);

Now I would like to replace swfobject.embedSWF (and all parameters) to something else.

Is there a not too horrible way to do this? Don't forget that the call can be on one or many lines, that the parameters can be wrapped with single quotes (') or double quotes ("), that whitespace can be all around...

EDIT: OK since catching all kind of JS syntax is a bit overkill I will simplify the requirement:

The regular expression can assume only the following

  1. The call is always on the same line
  2. It always start with swfobject.embedSWF (case sensitive)
  3. Is then followed (or not) by whitespaces and then a (
  4. Is then followed (or not) by whitespaces and then a " or a ' (either one but one of the 2 is required)
  5. Is then followed by the filename
  6. Is then followed by " or ' (if we can ensure that it's the same char that in 4 good if not too bad)
  7. Is then followed (or not) by whitespaces and then a ,
  8. Is then followed by anything
  9. Is then followed by ) then any whitespaces (or not) then 开发者_如何学Python; then an end of line.

It should be much simpler to parse this way (I guess).

EDIT 2: I've cooked a solution. I think I'm close but it's not working, Anyone can help? 0 should match but it's not...

<?php

$myFilename = 'test.swf';
$testCases = array();
$testCases[] = 'swfobject.embedSWF("test.swf", "The alternative content can screw the regexp with );", "300", "120", "9.0.0", false, flashvars, params, attributes);';

foreach ($testCases as $i => $currTest)
{
    $currResult = preg_match('/\s*swfobject\.embedSWF\s*\(\s*(["\'])(' . preg_quote($myFilename)  . ')[^"\']+\1\s*,[\s\S]+?\)\s*;\s*$/', $currTest);
    if ($currResult === false || $currResult < 1)
        echo $i, ' Not matching', PHP_EOL;
    else
        echo $i, ' Matching', PHP_EOL;
}

?>


Well, somebody had the time to write a basic javascript parser in PHP. I'd give the tokenizer a try (possibly using an HTML parser to first find the <script> nodes).


In regards of your EDIT2...

I'm not the best with regular expressions but you can try:

$currResult = preg_match('/\s*swfobject\.embedSWF\s*\(\s*(["\'])(' . preg_quote($myFilename)  . ')\1\s*,[\s\S]+?\)\s*;\s*$/', $currTest);

Seems to work OK for me.


Use 'grep' or similar on the command line to get a list of files that contain the .swf/script/object strings you need. That'll whittle down the number of files you need to process.

Then, use a PHP script to slurp each of those files into the DOM parser of your choice and do the replacing/fixing-up there.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜