awk file manipulation
I have the following words on my text file and I want extract as follow.
device1 te rfe3 -1 10.1.2.3 device1 te rfe3
device2 cdr thr 开发者_如何学JAVA 10.2.5.3 device2 cdr thr
device4 10.6.0.8 device4
device3 hrdnsrc dhe 10.8.3.6 device3 hrdnsrc dhe
my objective is to extract the device name and the ip adrress everything else to strip away. the is no pattern after device name some of them has 2-3 word some of them does not have any thing. also I don't need the 3rd column. I am looking the result like this.
device1 10.1.2.3
device2 10.2.5.3
device3 10.8.3.6
device3 10.8.9.4
is this possible? Thanks in advance.
sed -r 's/^([^ ]*) .* (([0-9]{1,3}\.){3}[0-9]{1,3}).*$/\1 \2/'
Proof of Concept
$ sed -r 's/^([^ ]*) .* (([0-9]{1,3}\.){3}[0-9]{1,3}).*$/\1 \2/' ./infile
device1 10.1.2.3
device2 10.2.5.3
device4 10.6.0.8
device3 10.8.3.6
In awk
, this is something like
$ awk '{
for (f = 2; f <= NF; f++) {
if ($f ~ /^([0-9]+\.){3}[0-9]+$/) {
print $1, $f
break
}
}
}' file
Here's a transcript:
mress:10192 Z$ cat pffft.awk
{
for (f = 2; f <= NF; f++) {
if ($f ~ /^([0-9]+\.){3}[0-9]+$/) {
print $1, $f
break
}
}
}
mress:10193 Z$ cat pfft.in
device1 te rfe3 -1 10.1.2.3 device1 te rfe3
device2 cdr thr 10.2.5.3 device2 cdr thr
device4 10.6.0.8 device4
device3 hrdnsrc dhe 10.8.3.6 device3 hrdnsrc dhe
mress:10194 Z$ awk -f pffft.awk pfft.in
device1 10.1.2.3
device2 10.2.5.3
device4 10.6.0.8
device3 10.8.3.6
mress:10195 Z$ _
in perl
perl -ne 'next if /^\s*$/ ; /^(\w+).*?(\d+(\.\d+){3})/; print "$1\t$2\n"' test_file
for sorted results you could probably pipe the output to sort command
perl -ne 'next if /^\s*$/ ; /^(\w+).*?(\d+(\.\d+){3})/; print "$1\t$2\n"' test_file | sort
Updated script like version
my $test_file = shift or die "no input file provided\n";
# open a filehandle to your test file
open my $fh, '<', $test_file or die "could not open $test_file: $!\n";
while (<$fh>) {
# ignore the blank lines
next if /^\s*$/;
# regex matching
/ # regex starts
^ # beginning of the string
(\w+) # store the first word in $1
\s+ # followed by a space
.*? # match anything but don't be greedy until...
(\d+(\.\d+){3}) # expands to (\d+\.\d+\.\d+\.\d+) and stored in $2
/x; # regex ends
# print first and second match
print "$1\t$2\n"
}
Python's not on your list, but something like this might work.
import sys
import re
pattern= re.compile( "^(\w+)\s.*?\s(\d+\.\d+\.\d+\.\d+)\s.*$" )
for line in sys.stdin:
match= pattern.match( line )
sys.stdout.write( "{0} {1}".format( match.group(1), match.group(2) ) )
It should work on most Linux platforms, since Python is already installed.
Assuming the input file has the fields always aligned to the same columns, the shortest POSIX solution would be
$ cut -c1-8,23-33 x
device1 10.1.2.3
device2 10.2.5.3
device4 10.6.0.8
device3 10.8.3.6
Depending on how close to an IP number the cruft get, this may or may not cat your cake:
sed -re 's/^([^ ]*).* ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1 \2/g'
À la the cut
solution with perl you could use "unpack" if the file is always in the same format column wise:
perl -nE 'say unpack("A8 x14 A9")' data.txt
Or use a regular expression to get the first word followed by a space ^(\w+\s)
and then one or more digits following a .
3 times (\d+(\.\d+){3})
:
perl -nE '/^(?<name>\w+\s).*?(?<ip>\d+(\.\d+){3})/;
say "$+{name} $+{ip}" ' data.txt
The named captures ($+{name} $+{ip}
) are just for fun :-)
精彩评论