Best way to parse a Javascript file in PHP to extract the array defined inside it
I've got a Javascript file which is automatically generated from a legacy app which has a huge array defined in it (and a couple of other functions and stuff). This javascript file performs searches against content, but over time it has grown to over 2Mb, which might not sound much, but you have to download this every time you开发者_如何转开发 want to do a search with this particular web app. Needless to say the performance is atrocious. I want a minimal effort way of putting a wrapper around the js so that instead of calling the js on the client side, it calls my new php script which does the search on the content.
The layout of the generated JS file will be the same each time it is generated, so I could write a bunch of specific trims and splits, but then I was thinking regexp might be the way to go, but to be honest I'm not sure, so I thought I would just ask you lovely people.
Sample source:
Page[0]=new Array("Some text1","More text1","Final Text1","abc.html");
Page[1]=new Array("Some text2","More text2","xyz.html");
As you can see, there is at least one entry in each array line, with the final entry being the name of the file being searched for.
Anyway, the question is, whether regexp is best (and if so, some suggested patterns would be great). or if I should be splitting this with split, etc.
Cheers
You are looking for something like this. Note I had the .js file as local so I used file()
to load it into array. For your actual script you'll probably need file_get_contents()
if your php can't access locally the .js file.
<?php
$lines = file('test.js');
$pages = array();
foreach($lines as $line) {
if(strpos($line, 'new Array') != false) {
preg_match('/Page\[\d\]\s?\=\s?new Array\((\"(.*)",?\s?\n?)+\);/', $line, $matches);
$values = preg_split('/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/', $matches[1]);
$currNo = count($pages);
$pages[$currNo] = array();
for($i = 0; $i < count($values); $i++) {
array_push($pages[$currNo], trim($values[$i], '"'));
}
}
}
var_dump($pages);
For your example the result will be the following:
array(2) {
[0]=>
array(4) {
[0]=>
string(10) "Some text1"
[1]=>
string(10) "More text1"
[2]=>
string(11) "Final Text1"
[3]=>
string(8) "abc.html"
}
[1]=>
array(3) {
[0]=>
string(10) "Some text2"
[1]=>
string(10) "More text2"
[2]=>
string(8) "xyz.html"
}
}
Enjoy!
What about using a PHP-based Javascript interpreter (like J4P5)?
I've never tried it myself, but the idea is to run the js file on server side and read that array from memory. This way you avoid both regexp and making users download the js file.
Use AJAX and totally avoid parsing JS for that. With AJAX you can easily send those arrays to a PHP file, elaborate contents and return results to JavaScript again.
My take on this would be to convert the JS into PHP and eval()
it. (GASP)
Just kidding on that one. HOWEVER, you can convert to PHP and tokenize it. I think this may be better in cases where regex will get overly complicated.
I thought I had the right solution to this, but apparently it converted PHP to JS (meh ;P) I'll try my own little attempt here at this...
$js='Page[0]=new Array("Some text1","More text1","Final Text1","abc.html"); '.
'Page[1]=new Array("Some text2","More text2","xyz.html");';
// Convert JS variable names to PHP (this seems pretty consistent in your app)
$php='<?php '.str_replace('Page[','$Page[',$js);
// '---PHP tag, tells tokenizer this is PHP code
// Parse the PHP-JS thingy
token_get_all($php);
Try
/Page\[\d\]=new Array\((.*)\);/simU
Example:
$js = <<< JS
Page[0]=new Array("Some text1","More text1","Final Text1","abc.html");
Page[1]=new Array("Some text2","More text2","xyz.html");
JS;
preg_match_all('/Page\[\d\]=new Array\((.*)\);/simU', $js, $matches);
print_r(array_map('str_getcsv', $matches[1]));
Output (live demo):
Array
(
[0] => Array
(
[0] => Some text1
[1] => More text1
[2] => Final Text1
[3] => abc.html
)
[1] => Array
(
[0] => Some text2
[1] => More text2
[2] => xyz.html
)
)
精彩评论