开发者

Compare/Difference of two arrays in Bash

Is it possible to take the difference of two arrays in Bash. What is a good way to do it?

Code:

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" ) 

Array3 =diff(Array1, Array2)

Array3 ideally should be :
Array3=( "key7" "开发者_开发知识库key8" "key9" "key10" )


echo ${Array1[@]} ${Array2[@]} | tr ' ' '\n' | sort | uniq -u

Output

key10
key7
key8
key9

You can add sorting if you need


If you strictly want Array1 - Array2, then

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )

Array3=()
for i in "${Array1[@]}"; do
    skip=
    for j in "${Array2[@]}"; do
        [[ $i == $j ]] && { skip=1; break; }
    done
    [[ -n $skip ]] || Array3+=("$i")
done
declare -p Array3

Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.


For a symmetric difference like Dennis's answer, existing tools like comm work, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).

Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from comm back into an array.

$ oldIFS=$IFS IFS=$'\n\t'
$ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))
comm: file 1 is not in sorted order
$ IFS=$oldIFS
$ declare -p Array3
declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'

It complains because, by lexographical sorting, key1 < … < key9 > key10. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-order to get rid of the warning, or add a | sort -u inside the <(…) process substitution if you can't guarantee order&uniqueness of the input arrays.


Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.

Code

#!/bin/bash

diff(){
  awk 'BEGIN{RS=ORS=" "}
       {NR==FNR?a[$0]++:a[$0]--}
       END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")
}

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=($(diff Array1[@] Array2[@]))
echo ${Array3[@]}

Output

$ ./diffArray.sh
key10 key7 key8 key9

*Note**: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.


Having ARR1 and ARR2 as arguments, use comm to do the job and mapfile to put it back into RESULT array:

ARR1=("key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10")
ARR2=("key1" "key2" "key3" "key4" "key5" "key6")

mapfile -t RESULT < \
    <(comm -23 \
        <(IFS=$'\n'; echo "${ARR1[*]}" | sort) \
        <(IFS=$'\n'; echo "${ARR2[*]}" | sort) \
    )

echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

Note that result may not meet source order.

Bonus aka "that's what you are here for":

function array_diff {
    eval local ARR1=\(\"\${$2[@]}\"\)
    eval local ARR2=\(\"\${$3[@]}\"\)
    local IFS=$'\n'
    mapfile -t $1 < <(comm -23 <(echo "${ARR1[*]}" | sort) <(echo "${ARR2[*]}" | sort))
}

# usage:
array_diff RESULT ARR1 ARR2
echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

Using those tricky evals is the least worst option among others dealing with array parameters passing in bash.

Also, take a look at comm manpage; based on this code it's very easy to implement, for example, array_intersect: just use -12 as comm options.


In Bash 4:

declare -A temp    # associative array
for element in "${Array1[@]}" "${Array2[@]}"
do
    ((temp[$element]++))
done
for element in "${!temp[@]}"
do
    if (( ${temp[$element]} > 1 ))
    then
        unset "temp[$element]"
    fi
done
Array3=(${!temp[@]})    # retrieve the keys as values

Edit:

ephemient pointed out a potentially serious bug. If an element exists in one array with one or more duplicates and doesn't exist at all in the other array, it will be incorrectly removed from the list of unique values. The version below attempts to handle that situation.

declare -A temp1 temp2    # associative arrays
for element in "${Array1[@]}"
do
    ((temp1[$element]++))
done

for element in "${Array2[@]}"
do
    ((temp2[$element]++))
done

for element in "${!temp1[@]}"
do
    if (( ${temp1[$element]} >= 1 && ${temp2[$element]-0} >= 1 ))
    then
        unset "temp1[$element]" "temp2[$element]"
    fi
done
Array3=(${!temp1[@]} ${!temp2[@]})


It is possible to use regex too (based on another answer: Array intersection in bash):

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if ! [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}:

Result:

$ bash diff-arrays.sh 
4 7 10 12


Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
a1=${Array1[@]};a2=${Array2[@]}; a3=${Array3[@]}
diff(){
    a1="$1"
    a2="$2"
    awk -va1="$a1" -va2="$a2" '
     BEGIN{
       m= split(a1, A1," ")
       n= split(a2, t," ")
       for(i=1;i<=n;i++) { A2[t[i]] }
       for (i=1;i<=m;i++){
            if( ! (A1[i] in A2)  ){
                printf A1[i]" "
            }
        }
    }'
}
Array4=( $(diff "$a1" "$a2") )  #compare a1 against a2
echo "Array4: ${Array4[@]}"
Array4=( $(diff "$a3" "$a1") )  #compare a3 against a1
echo "Array4: ${Array4[@]}"

output

$ ./shell.sh
Array4: key7 key8 key9 key10
Array4: key11


@ilya-bystrov's most upvoted answer calculates the difference of Array1 and Array2. Please note that this is not the same as removing items from Array1 that are also in Array2. @ilya-bystrov's solution rather concatenates both lists and removes non-unique values. This is a huge difference when Array2 includes items that are not in Array1: Array3 will contain values that are in Array2, but not in Array1.

Here's a pure Bash solution for removing items from Array1 that are also in Array2 (note the additional "key11" in Array2):

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
Array3=( $(printf "%s\n" "${Array1[@]}" "${Array2[@]}" "${Array2[@]}" | sort | uniq -u) )

Array3 will consist of "key7" "key8" "key9" "key10" and exclude the unexpected "key11" when trying to remove items from Array1.

Please note: This assumes that all values in Array1 are unique. Otherwise they won't show up in Array3. If Array1 contains duplicate values, you must remove the duplicates first (note the duplicate "key10" in Array1):

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
Array3=( $({ printf "%s\n" "${Array1[@]} | sort -u; printf "%s\n" "${Array2[@]}" "${Array2[@]}"; } | sort | uniq -u) )

If you want to replicate the duplicates in Array1 to Array2, go with @ephemient' accepted answer. The same is true if Array1 and Array2 are huge: this is a very inefficient solution for a lot of items, even though it's negligible for a few items (<100). If you need to process huge arrays don't use Bash.


this code replace with diff

echo ${test1[@]} ${test2[@]} | sed 's/ /\n/g' | sort | uniq -u

for result reverse use uniq -d

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜