How to extract values from text using multiple (nested) delimiters
On a day-to-day basis I need to extract bits of text from logs and other text data in various mixed formats. Is there a utility (like awk, grep, etc.) I could use to quickly perform the task without having to resort to writing long bash/perl/python scripts?
Example 1: For input text below
mylog user=UserName;password=Password;other=information
I would like to extract user name and password values. The pseudo-utility would preferably looks like this (a la awk
):
cat input-text.txt | magic --delimit-by=";" --then-by="="
'{print "The username is $values[0][1] and password is $values[1][1]"}'
Where the input string delimited by ;
is placed in $values
array, and each value in that array is further delimited by =
to form a nested array.
Even better, would be nice to have something like this:
cat input-text.txt | magic --map-entry-sep=";" --map-key-val-sep="="
'{print "The username is $[user] and password is $[password]"}'
Where the result of parsing is converted into a ma开发者_C百科p for easy lookup by key.
Example 2: Would be nice to parse triple nested elements too. Consider input text like
mylog mylist=one,two,three;other=information
I would like to now extract the 2nd element of list mylist
using something like:
cat input-text.txt | magic --delimit-by=";" --then-by="=" --and-then-by=","
'{print "The second element of mylist is: $values[0][1][1]}'
Of course, I would rather use some kind of JSON parser and convert input data into it's respective object/map/list format for easier extraction, but it's not possible because I am working with data in different formats.
I usually use a combination of awk, grep, cut and sed combined using several pipes and extract each value (column) of interest at a time, but that is tedious and requires merging different columns into one later. Usually, I need all extracted columns in CSV format for further processing in Excel.
Would be grateful for any suggestions or comments.
$ echo 'mylog user=UserName;password=Password;other=information' |
awk -F '[ ;]' -v keysep="=" \
'{
for (i=1; i<=NF; i++) {
split($i, t, keysep);
a[t[1]] = t[2]
};
print "The username is " a["user"] " and password is " a["password"]
}'
The username is UserName and password is Password
$ echo 'mylog mylist=one,two,three;other=information' | awk -F "[ =,;]" '{print $4}'
two
精彩评论