开发者

How to extract values from text using multiple (nested) delimiters

On a day-to-day basis I need to extract bits of text from logs and other text data in various mixed formats. Is there a utility (like awk, grep, etc.) I could use to quickly perform the task without having to resort to writing long bash/perl/python scripts?

Example 1: For input text below

mylog user=UserName;password=Password;other=information

I would like to extract user name and password values. The pseudo-utility would preferably looks like this (a la awk):

cat input-text.txt | magic --delimit-by=";" --then-by="="
  '{print "The username is $values[0][1] and password is $values[1][1]"}'

Where the input string delimited by ; is placed in $values array, and each value in that array is further delimited by = to form a nested array.

Even better, would be nice to have something like this:

cat input-text.txt | magic --map-entry-sep=";" --map-key-val-sep="="
  '{print "The username is $[user] and password is $[password]"}'

Where the result of parsing is converted into a ma开发者_C百科p for easy lookup by key.

Example 2: Would be nice to parse triple nested elements too. Consider input text like

mylog mylist=one,two,three;other=information

I would like to now extract the 2nd element of list mylist using something like:

cat input-text.txt | magic --delimit-by=";" --then-by="=" --and-then-by=","
  '{print "The second element of mylist is: $values[0][1][1]}'

Of course, I would rather use some kind of JSON parser and convert input data into it's respective object/map/list format for easier extraction, but it's not possible because I am working with data in different formats.

I usually use a combination of awk, grep, cut and sed combined using several pipes and extract each value (column) of interest at a time, but that is tedious and requires merging different columns into one later. Usually, I need all extracted columns in CSV format for further processing in Excel.

Would be grateful for any suggestions or comments.


$ echo 'mylog user=UserName;password=Password;other=information' | 
    awk -F '[ ;]' -v keysep="=" \
        '{
              for (i=1; i<=NF; i++) {
                  split($i, t, keysep); 
                  a[t[1]] = t[2]
              };
         print "The username is " a["user"] " and password is " a["password"]
         }'
The username is UserName and password is Password

$ echo 'mylog mylist=one,two,three;other=information' | awk -F "[ =,;]" '{print $4}'
two
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜