开发者

Regular Expression to extract timestamp and comment

I have a number of exported开发者_运维知识库 text fields from an old access database that are being ported over into a new MySQL structure. There are various field inputs in the format:

10/06/2010 09:10:40 Work not yet started

I would like to take that string and use some sort of regular expression to extract the date/time information and then the comment afterwards.

Is there a simple regular expression syntax for matching this information?


You can use this instead of a regex:

$parts = explode(" ", $string, 3);


I think I'll have a go a this

preg_match('|^([0-9]{2})/([0-9]{2})/([0-9]{4})\s([0-9]{2}):([0-9]{2}):([0-9]{2})\s(.*)$|',$str,$matches);
list($str,$d,$m,$y,$h,$m,$s,$comment)=$matches;

you then have the necessary values to reconstruct the time in any format you wish.


As I see it, you can just use the existing spaces as delimiters, yielding the following expression:

/([^ ]+) ([^ ]+) (.+)/

That is: three groups separated by spaces, of which the first two groups don’t contain any spaces (but the third may).


In the circumstances regex is expensive. If this is the format always guaranteed to be there, you could split it by 2 spaces and use the first 2 slices as following:

$str = "10/06/2010 09:10:40 Work not yet started";
$slices = explode(" ", $str, 3);
$timestamp = strtotime($slices[0] . $slices[1]);
echo "String is $str\n";
echo "Timestamp is $timestamp\n";
echo "Timestamp to date is " . strftime("%d.%m.%Y %T", $timestamp) . "\n";


Well, if your date/time is stored as type datetime, then you can use something like

preg_match("/^([0-9\\/]{10} [0-9:]{8}) (.*)$/",$str,$matches);
$datetime = $matches[1];
$description = $matches[2];

If your storing the date/time separately, you can use

preg_match("/^([0-9\\/]{10}) ([0-9:]{8}) (.*)$/",$str,$matches);
$date = $matches[1];
$time = $matches[2];
$description = $matches[3];

Of course, an alternative to regular expressions is to explode the string:

list($date,$time,$description) = explode(' ',$str,3);

And another option, assuming the dates and times are always the same length:

$date = substr($str,0,10);
$time = substr($str,11,19);
$description = substr($str,20);


if(preg_match('([0-9/]+ [0-9:]+)', $myString, $regs)) {
  $myTime = strtotime($regs[1]);
}


If you just want to extract it to 2 strings, you can use:

([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}\s[0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})\s(.*)


You can extract the information with the below code:

// sample string you provided
$string = "10/06/2010 09:10:40 Work not yet started";

// regular expression to use
$regex = "/^(\d+)\/(\d+)\/(\d+) (\d+)\:(\d+)\:(\d+) (.+?)$/";

Now, all the fields you'd want is in the array $matches. To extract informations into the array $matches, you can use preg_match()

// method 1: just extract
preg_match($regex, $string, $matches);

// method 2: to check if the string matches the format you provided first
//           then do something with the extracted text
if (preg_match($regex, $string, $matches) > 0) {
   // do something
}

To further use the information you've got:

// to get a Unix timestamp out of the matches
// you may use mktime()

// method 1: supposed your date format above is dd/mm/yyyy
$timestamp = mktime($matches[4], $matches[5], $matches[6], 
  $matches[2], $matches[1], $matches[3]);

// method 2: or if your date format above is mm/dd/yyyy
$timestamp = mktime($matches[4], $matches[5], $matches[6], 
  $matches[1], $matches[2], $matches[3]);

Then you may want to see if the time is correctly parsed:

print date('r', $timestamp)

At last, get the comment like this:

$comment = $matches[7];

Be aware of time zone issue. If you're parsing these data on the same server they're generated, you'd most likely be fine. You might need to add / subtract time from the timestamp above.


$s = '10/06/2010 09:10:40 Work not yet started';
$date = substr($s, 0, 19);
$msg = substr($s, 20);

$date = strtotime($date);
// or
$date = strptime($date, "%m/%d/%Y %H:%M:%S");


Is there a simple regular expression syntax for matching this information?

Yes. Yes there is. This is an exercise in "extraction" not "validation". You want to split the string only once on the space that immediately trails the datetime expression to form exactly two elements. Match the date, then the space, then the time, then forget everything that was matched (\K metacharacter -- restarts the fullstring match), then match the space to be used as the delimiter.

Limit the explosions so that only two elements are generated even if the comment has spaces in it.

Code: (Demo)

$string = '10/06/2010 09:10:40 Work not yet started';
var_export(preg_split('/\S+ \S+\K /', $string, 2));

Output:

array (
  0 => '10/06/2010 09:10:40',
  1 => 'Work not yet started',
)

No capture groups are necessary and preg_match() is less ideal because it creates excess data in its output. preg_split() is the single-function technique that most directly provides the desired output. If this were my project, I wouldn't do it any other way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜