Create array from unformatted data with PHP
Our application receives log files via email and so the lines are often broken up by the email client. Once I've read the body of the email in I have a string variable $log in the following format.
Fri Aug 26 11:52:30 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]
PKCS11] built Fri Aug 26 11:52:30 2011 NOTE: OpenVPN 2.1 requires '--script-security 2'
or higher to call user-defined scripts or executables Fri Aug 26 11:52:30 开发者_高级运维2011
Control Channel Authentication: using 'ta.key' as a OpenVPN static key file
Fri Aug 26 11:52:30 2011 Outgoing Control Channel Authentication: Using 160
bit message hash 'SHA1' for HMAC authentication Fri Aug 26 11:52:30
2011 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1'
for HMAC authentication Fri Aug 26 11:52:30 2011 LZO compression initialized
Fri Aug 26 11:52:30 2011 Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0
ET:0 EL:0 ] Fri Aug 26 11:52:30 2011 Socket Buffers: R=[8192->8192] S=[8192->8192]
As shown above the date does not always start on a newline. I'd like to generate an array containing the dates and log messages so that I can output a table with these fields in their own columns. I understand that I would need a regex to match the date field but how do I go about building the array?
I'm just going to update my answer with a new version entirely, since the example log file has changed a lot. Since the log seems to be line broken just about anywhere, this approach - now including a bit of regexp works:
$log="Fri Aug 26 11:52:30 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]
PKCS11] built Fri Aug 26 11:52:30 2011 NOTE: OpenVPN 2.1 requires '--script-security 2'
or higher to call user-defined scripts or executables Fri Aug 26 11:52:30 2011
Control Channel Authentication: using 'ta.key' as a OpenVPN static key file
Fri Aug 26 11:52:30 2011 Outgoing Control Channel Authentication: Using 160
bit message hash 'SHA1' for HMAC authentication Fri Aug 26 11:52:30
2011 Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1'
for HMAC authentication Fri Aug 26 11:52:30 2011 LZO compression initialized
Fri Aug 26 11:52:30 2011 Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0
ET:0 EL:0 ] Fri Aug 26 11:52:30 2011 Socket Buffers: R=[8192->8192] S=[8192->8192]
";
$str = implode(' ',preg_split("/[ ]*[\r\n]+/", $log));
$arrLogLines=preg_split('/[ ]*([\w]{3} [\w]{3} [0-9]{2} [\d:]+ \d{4}) /',$str,-1,PREG_SPLIT_DELIM_CAPTURE); // Cred to Herbert for the regexp, seems to work fine..
array_shift($arrLogLines);
for ($i=0;$i<sizeof($arrLogLines);$i++) {
if (($i/2)==(int)($i/2)) {
$offset=0;
$strArrIdx='date';
} else {
$offset=1;
$strArrIdx='message';
}
$arrLogMessages[($i-$offset)/2][$strArrIdx]=$arrLogLines[$i];
}
var_dump($arrLogMessages);
It produces the expected:
array(8) {
[0]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(56) "OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] PKCS11] built"
}
[1]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(102) "NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables"
}
[2]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(75) "Control Channel Authentication: using 'ta.key' as a OpenVPN static key file"
}
[3]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(98) "Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication"
}
[4]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(98) "Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication"
}
[5]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(27) "LZO compression initialized"
}
[6]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(63) "Control Channel MTU parms [ L:1558 D:166 EF:66 EB:0 ET:0 EL:0 ]"
}
[7]=>
array(2) {
["date"]=>
string(24) "Fri Aug 26 11:52:30 2011"
["message"]=>
string(46) "Socket Buffers: R=[8192->8192] S=[8192->8192] "
}
}
I'm not a regex pro and sure there is an easier way, but this works:
$input = "Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]
[PKCS11] built on Mar 12 2011
Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security
2' or higher to call user-defined scripts or executables
Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'
as a OpenVPN static key file";
preg_match_all('/([\w]{3} [\w]{3} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}) (.*)/', $input, $matches, PREG_SET_ORDER);
var_dump($matches);
This results in:
array(3) {
[0] =>
array(3) {
[0] =>
string(67) "Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]"
[1] =>
string(24) "Wed Aug 03 13:56:31 2011"
[2] =>
string(42) "OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]"
}
[1] =>
array(3) {
[0] =>
string(70) "Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security"
[1] =>
string(24) "Wed Aug 03 13:56:31 2011"
[2] =>
string(45) "NOTE: OpenVPN 2.1 requires '--script-security"
}
[2] =>
array(3) {
[0] =>
string(71) "Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'"
[1] =>
string(24) "Wed Aug 03 13:56:31 2011"
[2] =>
string(46) "Control Channel Authentication: using 'ta.key'"
}
}
I believe this is what you're looking for:
<?php
$log = <<<LOG
Wed Aug 03 13:56:31 2011 OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2]
[PKCS11] built on Mar 12 2011
Wed Aug 03 13:56:31 2011 NOTE: OpenVPN 2.1 requires '--script-security
2' or higher to call user-defined scripts or executables
Wed Aug 03 13:56:31 2011 Control Channel Authentication: using 'ta.key'
as a OpenVPN static key file
LOG;
function splitLog($log)
{
$log = str_replace("\n",'~',$log);
$log = str_replace("\r",'',$log);
$log .= '~';
preg_match_all('/([\w]{3} [\w]{3} [0-9]{2} [\d:]+ \d{4})((?:.*?~){2})/', $log, $m);
$logArray = array();
foreach($m[0] as $k=>$v)
{
$a['date'] = $m[1][$k];
$a['message'] = trim(str_replace('~', '', $m[2][$k]));
array_push($logArray, $a);
}
return $logArray;
}
$logArray = splitLog($log);
var_dump($logArray);
?>
Output
array
0 =>
array
'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
'message' => string 'OpenVPN 2.1.4 i686-pc-mingw32 [SSL] [LZO2] [PKCS11] built on Mar 12 2011' (length=72)
1 =>
array
'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
'message' => string 'NOTE: OpenVPN 2.1 requires '--script-security 2' or higher to call user-defined scripts or executables' (length=102)
2 =>
array
'date' => string 'Wed Aug 03 13:56:31 2011' (length=24)
'message' => string 'Control Channel Authentication: using 'ta.key' as a OpenVPN static key file' (length=75)
If every line starts with a date like this, you can just use substr
.
The date exists on every line and always with the same length. Alright, the first line ends with a sate too, but that has a different meaning and a different notation. Regex isn't gonna help you with that either.
精彩评论