开发者

Awk consider double quoted string as one token and ignore space in between

Data file - data.txt:

ABC "I am ABC" 35 DESC
DEF "I am not ABC" 42 DESC

cat data.txt | awk '{print $2}'

will result the "I" instead of the string being quoted

How to make awk so that it ignore the spa开发者_如何转开发ce within the quote and think that it is one single token?


Another alternative would be to use the FPAT variable, that defines a regular expression describing the contents of each field.

Save this AWK script as parse.awk:

#!/bin/awk -f

BEGIN {
  FPAT = "([^ ]+)|(\"[^\"]+\")"
}
{
  print $2
}

Make it executable with chmod +x ./parse.awk and parse your data file as ./parse.awk data.txt:

"I am ABC"
"I am not ABC"


Yes, this can be done nicely in awk. It's easy to get all the fields without any serious hacks.

(This example works in both The One True Awk and in gawk.)

{
  split($0, a, "\"")
  $2 = a[2]
  $3 = $(NF - 1)
  $4 = $NF
  print "and the fields are ", $1, "+", $2, "+", $3, "+", $4
}


Try this:

$ cat data.txt | awk -F\" '{print $2}'
I am ABC
I am not ABC


The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.

Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\" when running the below program.

BEGIN { OFS = "" } {
    for (i = 1; i <= NF; i += 2) {
        gsub(/[ \t]+/, ",", $i)
    }
    print
}

This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.

You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).

Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".


I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:

#!/usr/bin/gawk -f

# Resplit $0 into array B. Spaces between double quotes are not separators.
# Single quotes not handled. No escaping of double quotes.
function resplit(       a, l, i, j, b, k, BNF) # all are local variables
{
  l=split($0, a, "\"")
  BNF=0
  delete B
  for (i=1;i<=l;++i)
  {
    if (i % 2)
    {
      k=split(a[i], b)
      for (j=1;j<=k;++j)
        B[++BNF] = b[j]
    }
    else
    {
      B[++BNF] = "\""a[i]"\""
    }
  }
}

{
  resplit()

  for (i=1;i<=length(B);++i)
    print i ": " B[i]
}

Hope it helps.


Okay, if you really want all three fields, you can get them, but it takes a lot of piping:

$ cat data.txt | awk -F\" '{print $1 "," $2 "," $3}' | awk -F' ,' '{print $1 "," $2}' | awk -F', ' '{print $1 "," $2}' | awk -F, '{print $1 "," $2 "," $3}'
ABC,I am ABC,35
DEF,I am not ABC,42

By the last pipe you've got all three fields to do whatever you'd like with.


Here is something like what I finally got working that is more generic for my project. Note it doesn't use awk.

someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
putItemsInLines() {
    local items=""
    local firstItem="true"
    while test $# -gt 0; do
        if [ "$firstItem" == "true" ]; then
            items="$1"
            firstItem="false"
        else
            items="$items
$1"
        fi
        shift
    done
    echo "$items"
}

count=0
while read -r valueLine; do
    echo "$count: $valueLine"
    count=$(( $count + 1 ))
done <<< "$(eval putItemsInLines $someText)"

Which outputs:

0: ABC
1: I am ABC
2: 35
3: DESC
4: 1 23
5: testing
6: 456
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜