开发者

Modifying PHP files using Perl (possibly using HTML::TreeBuilder)

I am trying to rework many pages across many sites. The pages may contain JavaScript, PHP, or ASP code in addition to HTML. The problem I'm encountering is that the module rewrites things I don't want rewritten. I've managed to handle most of the symbols (e.g., ", >) in HTML tags like script, but they get changed into entities (e.g., ", >) in the php sections. Plus, the php tags are stripped out at the same time.

If I have a PHP file that looks like this:

<html>
  <head><title>My Page</title></head>
  <body>
    <p>Some cruft &nbsp; which I want to repeat</p>
    <form name="foo"> (form content to be replaced)
    </form>
    <script type="JavaScript">
       <!--
       Some javaScript to be left alone
       -->
    </script>
    <a href="somepage.php">Link to be removed</a>
    <?php
       if (strlen($txtKeyword) > 2)
         {
           echo " or <a href=\"database_search_keyword.htm\">Search again?</a></p>";
           if(isset($_REQUEST['nr']))
         {
           $numRows = $_REQUEST['nr'];
           ....
    ?>
  </body>
</html>

I want the final result to look like:

<html>
  <head><title>My Page</title></head>
  <body>
    <p>Some cruft &nbsp; which I want to repeat</p>
    <ul><li>List replacing form</li>
    </ul>
    <script type="JavaScript">
       <!--
       Some javaScript to be left alone
       -->
    </script>
    <?php
       if (strlen($txtKeyword) > 2)
         {
           echo " or <a href=\"database_search_keyword.htm\">Search again?</a></p>";
           if(isset($_REQUEST['nr']))
         {
           $numRows = $_REQUEST['nr'];
           ....
    ?>
  </body>
</html>

As I said, I'm able to get everything working except the php. It gets managled, so the result

<html>
  <head><title>My Page</title></head>
  <body>
    <p>Some cruft &nbsp; which I want to repeat</p>
    <ul><li>List replacing form</li>
    </ul>
    <script type="JavaScript">
       <!--
       Some javaScript to be left alone
       -->
    </script>
    <?php
      if (strlen($txtKeyword) &gt; 2)
        {
          echo &quot; or &quot;;
          if(isset($_REQUEST[&#39;nr&#39;]))
        {
          $numRows = $_REQUEST[&#39;nr&#39;];
          ....
    ?>
  </body>
</html>

I have been working with HTML::TreeBuilder 3.23. I've tried the developer release 3.23_3, but it gives an error message due to php code (e.g., a has an invalid attribute name '"&section_id' ' . $section_id . ' ).

Example code for what I've done so far (with the filesystem walking, etc. chopped out) is

#!/usr/bin/perl -w

use strict;

use HTML::TreeBuilder;

# Set up replacement forms
my $artistSearch = HTML::Element->new ('~literal', 'text', <<EOF);
<p>Please select from the list below.</p>
<ul>
  <li><a href="http://firstlink.com/">item 1</a></li>
  <li><a href="http://secondlink.com/">item 1</a></li>
</ul>
EOF

my $filename = "AFA.php";
my $file = HTML::TreeBuilder->new();
$file->store_comments(1);
$file->ignore_ignorable_whitespace(1);
$file->no_space_compacting(1);
my $tree = $file->parse_file($filename);


my $form = $tree->find_by_tag_name(开发者_如何学Go'form');
my $fname = $form->attr('name');
if ($fname eq 'mainform') {
  $form->delete;
} elsif ($fname eq 'artist_search') {
  $form->replace_with($artistSearch)->delete;
} else {
  # It's a form we're not changing
}

my $printout =  $file->as_HTML("", "  ", {});
open (PAGE, "> $filename");
print PAGE $printout;
close (PAGE);
$file->delete;

I am open to any suggestions, examples, etc. I'm not necessarily tied to any particular module, but I'm not exactly an expert programmer.

Thank you!


The problem here is obviously the <?php .. ?> tag. You could accomplish this with a preparser. I'll use a simple regex for this:

use strict;
use warnings;
undef $/;
$_=<>;
my @phps;
push @phps, $1 while s/<\?php (.*?) \?>/__PHP_CODE__/;

use Data::Dumper;
die Dumper [$_, \@phps];

You can try it:

echo "foo<?php phpfoo ?> bar <?php phpbar ?> baz" | filter.pl


$VAR1 = [
          'foo__PHP_CODE__ bar __PHP_CODE__ baz',
          [
            'phpfoo',
            'phpbar'
          ]
        ];

Now, when you're done with it. You can just do the reverse to get the PHP code out of the @phps array and back into the proper order in the output:

my $count = 0;
s/__PHP_CODE__/<?php $phps[$count++] ?>/g;

Make no mistake about it, this is a hack; but, it will get your job done quite effectively without much thought. It is fairly simple to implement too. I can think of a ton of better ways to do this -- like extending HTML::Element to include a pseudo <?php .. ?> element. What you don't want is to undo mangling (like character-encoding) by HTML::Element in TT -- that sounds like a far worse idea to me. You could even implement the stuff that goes from the __PHP_CODE__ token to the real PHP code using an Template filter.

It should be noted this doesn't take care of shorttags (though it could easily!) And, I'm not sure of the logic that triggers the PHP interpreter (escaping <?php or ?> for instance). It should be obvious, though I'll disclose, that this pays no respect to PHP code like this:

echo '?>';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜