PHP, Regular expression to parse data
I have data in the format:
Football - 101 Carolina Panthers +15 -110 for Game
Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game
Football - 102 Pittsburgh Steelers -9 -120 for 1st Half
How to transform this into a PHP array:
$game_data[] = array( 'sport_type' => 'Football',
'game_number' => 101,
开发者_如何学JAVA 'game_name' => 'Carolina Panthers',
'runline_odd' => '+15 -110',
'total_odd' => '',
'odd_type' => 'runline',
'period' => 'Game' );
$game_data[] = array( 'sport_type' => 'Football',
'game_number' => 101,
'game_name' => 'Carolina Panthers/Pittsburgh Steelers',
'runline_odd' => '',
'total_odd' => 'under 36½ -110',
'odd_type' => 'total_odd',
'period' => 'Game' );
$game_data[] = array( 'sport_type' => 'Football',
'game_number' => 102,
'game_name' => 'Pittsburgh Steelers',
'runline_odd' => '-9 -120',
'total_odd' => '',
'odd_type' => 'runline',
'period' => '1st Half' );
Normally I wouldn't solve the whole problem for someone, but the ½
character made it interesting enough. Now, I'm not a super expert on regexes so this might not be the most optimized or elegant solution, but it seems to get the job done. At least with the provided sample input.
EDIT: Oops. Didn't catch that under
was actually part of the runline_odd
data. So this does actually not currently get the job done. I'll be back.
EDIT2: Revised the regex slightly and it now correctly matches between runline_odd
and runline_total
.
<?php
$input = array(
'Football - 101 Carolina Panthers +15 -110 for Game',
'Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game',
'Football - 102 Pittsburgh Steelers -9 -120 for 1st Half'
);
$regex = '^(?<sport_type>[[:alpha:]]*) - '.
'(?<game_number>[0-9]*) '.
'('.
'(?<game_nameb>[[:alpha:]\/ ]*?) '.
'(?<runline_total>(under ([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
'|'.
'(?<game_namea>[[:alpha:]\/ ]*) '.
'(?<runline_odd>((-|\+)?([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
')'.
'(?<period>.*)$';
$game_data = array();
foreach ($input as $in) {
$matches = false;
$cnt = preg_match('/' . $regex . '/ui', $in, $matches);
if ($cnt && is_array($matches) && count($matches)) {
if (empty($matches['game_nameb'])) {
$game_name = $matches['game_namea'];
$runline_odd = $matches['runline_odd'];
$total_odd = '';
} else {
$game_name = $matches['game_nameb'];
$runline_odd = '';
$total_odd = $matches['runline_total'];
}
$result = array(
'sport_type' => $matches['sport_type'],
'game_number' => $matches['game_number'],
'game_name' => $game_name,
'runline_odd' => $runline_odd,
'total_odd' => $total_odd,
'period' => $matches['period']
);
array_push($game_data, $result);
}
}
var_dump($game_data);
This produces the following:
$ /usr/local/bin/php preg-match.php
array(3) {
[0]=>
array(6) {
["sport_type"]=>
string(8) "Football"
["game_number"]=>
string(3) "101"
["game_name"]=>
string(17) "Carolina Panthers"
["runline_odd"]=>
string(8) "+15 -110"
["total_odd"]=>
string(0) ""
["period"]=>
string(4) "Game"
}
[1]=>
array(6) {
["sport_type"]=>
string(8) "Football"
["game_number"]=>
string(3) "101"
["game_name"]=>
string(37) "Carolina Panthers/Pittsburgh Steelers"
["runline_odd"]=>
string(0) ""
["total_odd"]=>
string(15) "under 36½ -110"
["period"]=>
string(4) "Game"
}
[2]=>
array(6) {
["sport_type"]=>
string(8) "Football"
["game_number"]=>
string(3) "102"
["game_name"]=>
string(19) "Pittsburgh Steelers"
["runline_odd"]=>
string(7) "-9 -120"
["total_odd"]=>
string(0) ""
["period"]=>
string(8) "1st Half"
}
}
Following works except the case where there is an under after gmae name:
/([^-]+)\s*-\s*(\d+)\s*([^\d+-]+)\s*((?:under\s*)?[\d\s+-]+)\s*for\s*(.+)/
Explanation:
([^-]+): Match anything other than -, which is separating gmae name from other details.
\s*-\s*: - surrounded with spaces
(\d+) : Game number
([^\d+-]+): Anything other than +, -, a digit. Matches gmae name.
((?:under\s*)?[\d\s+-]+): runline odd or total odd.
PS:
- Take care of the cases where there is 'under'. The regex above is dumping it with game_name.
- Take care of unicode chars.
精彩评论