开发者

Create array from unformatted data with PHP

Our application receives log files via email and so the lines are often broken up by the email client. Once I've read the body of the email in I have a string variable $log in the following format.

Fri Aug 26 11:52:30 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] 
PKCS11] built Fri Aug 26 11:52:30 2011 NOTE: OpenVPN 2.1 requires '--script-security 2' 
or higher to call user-defined scripts or executables Fri Aug 26 11:52:30 开发者_高级运维2011 
Control Channel Authentication: using 'ta.key' as a OpenVPN static key file 
Fri Aug 26 11:52:30 2011 Outgoing Control Channel Authentication: Using 160 
bit message hash 'SHA1' for HMAC authentication Fri Aug 26 11:52:30 
2011 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1'
for HMAC authentication Fri Aug 26 11:52:30 2011 LZO compression initialized 
Fri Aug 26 11:52:30 2011 Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0 
ET:0 EL:0 ] Fri Aug 26 11:52:30 2011 Socket Buffers: R=[8192->8192] S=[8192->8192]

As shown above the date does not always start on a newline. I'd like to generate an array containing the dates and log messages so that I can output a table with these fields in their own columns. I understand that I would need a regex to match the date field but how do I go about building the array?


I'm just going to update my answer with a new version entirely, since the example log file has changed a lot. Since the log seems to be line broken just about anywhere, this approach - now including a bit of regexp works:

$log="Fri Aug 26 11:52:30 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]  
PKCS11] built Fri Aug 26 11:52:30 2011 NOTE: OpenVPN 2.1 requires '--script-security 2'  
or higher to call user-defined scripts or executables Fri Aug 26 11:52:30 2011  
Control Channel Authentication: using 'ta.key' as a OpenVPN static key file  
Fri Aug 26 11:52:30 2011 Outgoing Control Channel Authentication: Using 160  
bit message hash 'SHA1' for HMAC authentication Fri Aug 26 11:52:30  
2011 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' 
for HMAC authentication Fri Aug 26 11:52:30 2011 LZO compression initialized  
Fri Aug 26 11:52:30 2011 Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0  
ET:0 EL:0 ] Fri Aug 26 11:52:30 2011 Socket Buffers: R=[8192->8192] S=[8192->8192] 
";
$str = implode(' ',preg_split("/[ ]*[\r\n]+/", $log));
$arrLogLines=preg_split('/[ ]*([\w]{3} [\w]{3} [0-9]{2} [\d:]+ \d{4}) /',$str,-1,PREG_SPLIT_DELIM_CAPTURE); // Cred to Herbert for the regexp, seems to work fine..
array_shift($arrLogLines);
for ($i=0;$i<sizeof($arrLogLines);$i++) {
    if (($i/2)==(int)($i/2)) {
        $offset=0;
        $strArrIdx='date';
    } else {
        $offset=1;
        $strArrIdx='message';
    }
    $arrLogMessages[($i-$offset)/2][$strArrIdx]=$arrLogLines[$i];
}
var_dump($arrLogMessages);

It produces the expected:

array(8) {
  [0]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(56) "OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] PKCS11] built"
  }
  [1]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(102) "NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables"
  }
  [2]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(75) "Control Channel Authentication: using 'ta.key' as a OpenVPN static key file"
  }
  [3]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(98) "Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication"
  }
  [4]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(98) "Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication"
  }
  [5]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(27) "LZO compression initialized"
  }
  [6]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(63) "Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0 ET:0 EL:0 ]"
  }
  [7]=>
  array(2) {
    ["date"]=>
    string(24) "Fri Aug 26 11:52:30 2011"
    ["message"]=>
    string(46) "Socket Buffers: R=[8192->8192] S=[8192->8192] "
  }
}


I'm not a regex pro and sure there is an easier way, but this works:

$input = "Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]
[PKCS11] built on Mar 12 2011
Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security
2' or higher to call user-defined scripts or executables
Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'
as a OpenVPN static key file";

preg_match_all('/([\w]{3} [\w]{3} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}) (.*)/', $input, $matches, PREG_SET_ORDER);

var_dump($matches);

This results in:

array(3) {
    [0] =>
    array(3) {
        [0] =>
        string(67) "Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]"
        [1] =>
        string(24) "Wed Aug 03 13:56:31 2011"
        [2] =>
        string(42) "OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]"
    }
    [1] =>
    array(3) {
        [0] =>
        string(70) "Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security"
        [1] =>
        string(24) "Wed Aug 03 13:56:31 2011"
        [2] =>
        string(45) "NOTE: OpenVPN 2.1 requires '--script-security"
    }
    [2] =>
    array(3) {
        [0] =>
        string(71) "Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'"
        [1] =>
        string(24) "Wed Aug 03 13:56:31 2011"
        [2] =>
        string(46) "Control Channel Authentication: using 'ta.key'"
    }
}


I believe this is what you're looking for:

<?php

$log = <<<LOG
Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] 
[PKCS11] built on Mar 12 2011
Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security 
2' or higher to call user-defined scripts or executables
Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key' 
as a OpenVPN static key file
LOG;


function splitLog($log)
{
    $log = str_replace("\n",'~',$log);
    $log = str_replace("\r",'',$log);
    $log .= '~';
    preg_match_all('/([\w]{3} [\w]{3} [0-9]{2} [\d:]+ \d{4})((?:.*?~){2})/', $log, $m);

    $logArray = array();

    foreach($m[0] as $k=>$v)
    {
        $a['date'] = $m[1][$k];
        $a['message'] = trim(str_replace('~', '', $m[2][$k]));
        array_push($logArray, $a);
    }

    return $logArray;
}

$logArray = splitLog($log);
var_dump($logArray);

?>

Output

array
  0 => 
    array
      'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
      'message' => string 'OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] [PKCS11] built on Mar 12 2011' (length=72)
  1 => 
    array
      'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
      'message' => string 'NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables' (length=102)
  2 => 
    array
      'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
      'message' => string 'Control Channel Authentication: using 'ta.key' as a OpenVPN static key file' (length=75)


If every line starts with a date like this, you can just use substr. The date exists on every line and always with the same length. Alright, the first line ends with a sate too, but that has a different meaning and a different notation. Regex isn't gonna help you with that either.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜