开发者

PHP, Regular expression to parse data

I have data in the format:

Football - 101 Carolina Panthers +15 -110 for Game

Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game

Football - 102 Pittsburgh Steelers -9 -120 for 1st Half


How to transform this into a PHP array:

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 101,
       开发者_如何学JAVA               'game_name'   => 'Carolina Panthers',
                      'runline_odd' => '+15 -110',
                      'total_odd'   => '',
                      'odd_type'    => 'runline',
                      'period'      => 'Game' );

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 101,
                      'game_name'   => 'Carolina Panthers/Pittsburgh Steelers',
                      'runline_odd' => '',
                      'total_odd'   => 'under 36½ -110',
                      'odd_type'    => 'total_odd',
                      'period'      => 'Game' );

$game_data[] = array( 'sport_type'  => 'Football',
                      'game_number' => 102,
                      'game_name'   => 'Pittsburgh Steelers',
                      'runline_odd' => '-9 -120',
                      'total_odd'   => '',
                      'odd_type'    => 'runline',
                      'period'      => '1st Half' );


Normally I wouldn't solve the whole problem for someone, but the ½ character made it interesting enough. Now, I'm not a super expert on regexes so this might not be the most optimized or elegant solution, but it seems to get the job done. At least with the provided sample input.

EDIT: Oops. Didn't catch that under was actually part of the runline_odd data. So this does actually not currently get the job done. I'll be back.

EDIT2: Revised the regex slightly and it now correctly matches between runline_odd and runline_total.

<?php
$input = array(
'Football - 101 Carolina Panthers +15 -110 for Game',
'Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game',
'Football - 102 Pittsburgh Steelers -9 -120 for 1st Half'
);

$regex = '^(?<sport_type>[[:alpha:]]*) - '.
         '(?<game_number>[0-9]*) '.
         '('.
            '(?<game_nameb>[[:alpha:]\/ ]*?) '.
            '(?<runline_total>(under ([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
         '|'.
            '(?<game_namea>[[:alpha:]\/ ]*) '.
            '(?<runline_odd>((-|\+)?([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
         ')'.
         '(?<period>.*)$';


$game_data = array();

foreach ($input as $in) {
    $matches = false;
    $cnt = preg_match('/' . $regex . '/ui', $in, $matches);

    if ($cnt && is_array($matches) && count($matches)) {
        if (empty($matches['game_nameb'])) {
            $game_name = $matches['game_namea'];
            $runline_odd = $matches['runline_odd'];
            $total_odd = '';
        } else {
            $game_name = $matches['game_nameb'];
            $runline_odd = '';
            $total_odd = $matches['runline_total'];
        }


        $result = array(
            'sport_type' => $matches['sport_type'],
            'game_number' => $matches['game_number'],
            'game_name' => $game_name,
            'runline_odd' => $runline_odd,
            'total_odd' => $total_odd,
            'period' => $matches['period']
        );

        array_push($game_data, $result);
    }
}

var_dump($game_data);

This produces the following:

$ /usr/local/bin/php preg-match.php 
array(3) {
[0]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "101"
    ["game_name"]=>
    string(17) "Carolina Panthers"
    ["runline_odd"]=>
    string(8) "+15 -110"
    ["total_odd"]=>
    string(0) ""
    ["period"]=>
    string(4) "Game"
  }
  [1]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "101"
    ["game_name"]=>
    string(37) "Carolina Panthers/Pittsburgh Steelers"
    ["runline_odd"]=>
    string(0) ""
    ["total_odd"]=>
    string(15) "under 36½ -110"
    ["period"]=>
    string(4) "Game"
  }
  [2]=>
  array(6) {
    ["sport_type"]=>
    string(8) "Football"
    ["game_number"]=>
    string(3) "102"
    ["game_name"]=>
    string(19) "Pittsburgh Steelers"
    ["runline_odd"]=>
    string(7) "-9 -120"
    ["total_odd"]=>
    string(0) ""
    ["period"]=>
    string(8) "1st Half"
  }
}


Following works except the case where there is an under after gmae name:

/([^-]+)\s*-\s*(\d+)\s*([^\d+-]+)\s*((?:under\s*)?[\d\s+-]+)\s*for\s*(.+)/

Explanation:

([^-]+): Match anything other than -, which is separating gmae name from other details.
\s*-\s*: - surrounded with spaces
(\d+)  : Game number
([^\d+-]+): Anything other than +, -, a digit. Matches gmae name.
((?:under\s*)?[\d\s+-]+): runline odd or total odd.

PS:

  1. Take care of the cases where there is 'under'. The regex above is dumping it with game_name.
  2. Take care of unicode chars.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜