perl - regex help parsing hostname from log
I need help with my regex to grab my host information from this logfile:
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO: host=test1.dom.colo.name.com
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO: "/home/bin64"/admin --user="foo-bar" --password="*****" --host="test1.dom.colo.name.com" --port="9999" --socket="/tmp" variables
My regex is also grabbing the 2nd line to include t开发者_StackOverflow社区he hostname in double quotes and other pieces of data on that line, which I am not interested in. The first line is fine only. So, I'm just interested in
test1.dom.colo.name.com
and nothing else.
My regex so far is this:
if ($line =~ m/(host=)(.+)/){
Thanks!
It'll work better if you exclude spaces and quotes from the match:
host=([^\s"]+)
By excluding quotes this will match the host=...
in the first line while ignoring the --host="..."
in the second line.
Edit: This simple test script works for me on your sample input. What happens if you run this?
#!/usr/bin/env perl
while ($line = <>) {
if ($line =~ /host=([^\s"]+)/) {
print "$1\n";
}
}
Here is a regex to do that:
/host="?([^\s"]+)"?/m
Your first line does not have quotes around the data; the second line does. Hence the "?
construct. Assumably you cannot have a space (or a closing quote) so grab everything other than those. Hence ([^\s"]+)
Cheers!
Edit: This works:
use strict; use warnings;
my $i=1;
while (<DATA>) {
print "match on line $i: $1\n" if /host="?([^\s"]+)"?/;
$i++;
}
__DATA__
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO: host=test1.dom.colo.name.com
Tue Aug 24 10:22:14 2010: test1.colo_lvm:check:INFO: "/home/bin64"/admin --user="foo-bar" --password="*****" --host="test1.dom.colo.name.com" --port="9999" --socket="/tmp" variables
Output:
match on line 1 test1.dom.colo.name.com
match on line 2 test1.dom.colo.name.com
If hostname cannot contain whitespace then I'd do: /(host=)(\S+)/
Try this:
$line =~ m/host="?([^"\s]+)/
You don't need parens around the host=
if you don't actually want to parse that out as data (which, since you're always matching it, it doesn't seem you need to). Using [^"\s]+
will give you a string that doesn't have an "
or whitespace characters in it, which will prevent it from running beyond the field boundaries.
The "?
bit before the capture will allow the value to be quoted (or not) while keeping any quote marks out of the actual matched data, so you don't have to worry about stripping them out in your data processing.
精彩评论