开发者

PHP Regular expression stopped working although unchanged

//START GET DATES
$regexp = '/[0-9]{2,4}[-\/ ]{1}([A-Za-z]{3}|[0-9]{2})[-\/ ]{1}[0-9]{2,4}/i';

preg_match_all($regexp, $output, $dates);

//Dec 开发者_如何学编程05, 1995 + December 5, 1995
$regexp = '/\b[[A-Za-z]{3,9}\b[ 0-9\,]{2,5}[0-9]{4}/i';
preg_match_all($regexp, $output, $dates);

//09 Aug 2012
$regexp = '/[0-9]{2}[ ]{1}[A-Za-z]{3}[ ]{1}[0-9]{4}/i';
preg_match_all($regexp, $output, $dates);
print_r($dates);

The above is my regular expression to extract dates of different formats from a clump of text..

The expression was working perfectly, and as far as I can remember absolutely nothing has been changed..

Can anyone tell me if anything is wrong with the expressions, and if not what else could have caused this sudden break?

Cheers


Without some more information it's hard to give a precise answer, but a few things come to mind:

  • These are some sloppy regexes.
    • [A-Za-z] and then the case-insensitive option.
    • [[A-Za-z].
    • {1} (repeatedly).
    • Needless escapes, and more. I wouldn't be surprised if there were errors in them, too.
  • You're applying the regexes in sequence. I don't know PHP, but it looks like the results of the previous matches are overwritten by the next preg_match_all. Perhaps you do have results, but they are overwritten by the next regex which happens not to have any matches?

So let's try to find a better regex for you, a single one. How about this:

preg_match_all(
    '%\b                  # Start at a word boundary
    (?:                   # Match the following:
     (?:                  # either
      \d+\b               # a number,
      (?:\.|st|nd|rd|th)* # followed by a dot, st, nd, rd, or th (optional)
      |                   # or a month name
      (?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*)\b
     )
     [\s.,/-]*            # followed by a date separator, comma or whitespace (opt.)
    ){3}                  # Do this three times
    (?<!\s)               # Don\'t match trailing whitespace
    %ix', 
    $output, $dates, PREG_PATTERN_ORDER);
$dates = $dates[0];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜