开发者

perl - regex help parsing hostname from log

I need help with my regex to grab my host information from this logfile:

Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO:    host=test1.dom.colo.name.com
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO: "/home/bin64"/admin --user="foo-bar" --password="*****" --host="test1.dom.colo.name.com" --port="9999" --socket="/tmp" variables

My regex is also grabbing the 2nd line to include t开发者_StackOverflow社区he hostname in double quotes and other pieces of data on that line, which I am not interested in. The first line is fine only. So, I'm just interested in test1.dom.colo.name.com and nothing else.

My regex so far is this:

if ($line =~ m/(host=)(.+)/){

Thanks!


It'll work better if you exclude spaces and quotes from the match:

host=([^\s"]+)

By excluding quotes this will match the host=... in the first line while ignoring the --host="..." in the second line.

Edit: This simple test script works for me on your sample input. What happens if you run this?

#!/usr/bin/env perl

while ($line = <>) {
    if ($line =~ /host=([^\s"]+)/) {
        print "$1\n";
    }
}


Here is a regex to do that:

/host="?([^\s"]+)"?/m

Your first line does not have quotes around the data; the second line does. Hence the "? construct. Assumably you cannot have a space (or a closing quote) so grab everything other than those. Hence ([^\s"]+)

Cheers!

Edit: This works:

use strict; use warnings;
my $i=1;
while (<DATA>) {
    print "match on line $i: $1\n" if /host="?([^\s"]+)"?/;
    $i++;
}

__DATA__
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO:    host=test1.dom.colo.name.com
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO: "/home/bin64"/admin --user="foo-bar" --password="*****" --host="test1.dom.colo.name.com" --port="9999" --socket="/tmp" variables

Output:

match on line 1 test1.dom.colo.name.com
match on line 2 test1.dom.colo.name.com


If hostname cannot contain whitespace then I'd do: /(host=)(\S+)/


Try this:

$line =~ m/host="?([^"\s]+)/

You don't need parens around the host= if you don't actually want to parse that out as data (which, since you're always matching it, it doesn't seem you need to). Using [^"\s]+ will give you a string that doesn't have an " or whitespace characters in it, which will prevent it from running beyond the field boundaries.

The "? bit before the capture will allow the value to be quoted (or not) while keeping any quote marks out of the actual matched data, so you don't have to worry about stripping them out in your data processing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜