Removing duplicates on a variable without sorting
I have a variable that contains the following space separated entries.
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
How do I remo开发者_如何学Gove the duplicates without sorting?
#Something like this.
new_variable="apple lemon papaya avocado grapes mango banana"
I have found somewhere a script that accomplish removing the duplicates of a variable, but does sort the contents.
#Not something like this.
new_variable=$(echo "$variable"|tr " " "\n"|sort|uniq|tr "\n" " ")
echo $new_variable
apple avocado banana grapes lemon mango papaya
new_variable=$( awk 'BEGIN{RS=ORS=" "}!a[$0]++' <<<$variable );
Here's how it works:
RS (Input Record Separator) is set to a white space so that it treats each fruit in $variable as a record instead of a field. The non-sorting unique magic happens with !a[$0]++. Since awk supports associative arrays, it uses the current record ($0) as the key to the array a[]. If that key has not been seen before, a[$0] evaluates to '0' (awk's default value for unset indices) which is then negated to return TRUE. I then exploit the fact that awk will default to 'print $0' if an expression returns TRUE and no '{ commands }' are given. Finally, a[$0] is then incremented such that this key can no longer return TRUE and thus repeat values are never printed. ORS (Output Record Separator) is set to a space as well to mimic the input format.
A less terse version of this command which produces the same output would be the following:
awk 'BEGIN{RS=ORS=" "}{ if (a[$0] == 0){ a[$0] += 1; print $0}}'
Gotta love awk =)
EDIT
If you needed to do this in pure Bash 2.1+, I would suggest this:
#!/bin/bash
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
temp="$variable"
new_variable="${temp%% *}"
while [[ "$temp" != ${new_variable##* } ]]; do
temp=${temp//${temp%% *} /}
new_variable="$new_variable ${temp%% *}"
done
echo $new_variable;
This pipeline version works by preserving the original order:
variable=$(echo "$variable" | tr ' ' '\n' | nl | sort -u -k2 | sort -n | cut -f2-)
Pure Bash:
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
declare new_value=''
for item in $variable; do
if [[ ! $new_value =~ $item ]] ; then # first time?
new_value="$new_value $item"
fi
done
new_value=${new_value:1} # remove leading blank
In pure, portable sh
:
words="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
seen=
for word in $words; do
case $seen in
$word\ * | *\ $word | *\ $word\ * | $word)
# already seen
;;
*)
seen="$seen $word"
;;
esac
done
echo $seen
shell
declare -a arr
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
set -- $variable
count=0
for c in $@
do
flag=0
for((i=0;i<=${#arr[@]}-1;i++))
do
if [ "${arr[$i]}" == "$c" ] ;then
flag=1
break
fi
done
if [ "$flag" -eq 0 ] ; then
arr[$count]="$c"
count=$((count+1))
fi
done
for((i=0;i<=${#arr[@]}-1;i++))
do
echo "result: ${arr[$i]}"
done
Result when run:
linux# ./myscript.sh
result: apple
result: lemon
result: papaya
result: avocado
result: grapes
result: mango
result: banana
OR if you want to use gawk
awk 'BEGIN{RS=ORS=" "} (!($0 in a) ){a[$0];print}'
Z Shell:
% variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
% print ${(zu)variable}
apple lemon papaya avocado grapes mango banana
Another awk
solution:
#!/bin/bash
variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"
variable=$(printf '%s\n' "$variable" | awk -v RS='[[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}')
variable="${variable%,*}"
echo "$variable"
Output:
apple lemon papaya avocado grapes mango banana
Perl solution:
perl -le 'for (@ARGV){ $h{$_}++ }; for (keys %h){ print $_ }' $variable
@ARGV
is the list of input parameters from $variable
Loop through the list, populating the h
hash with the loop variable $_
Loop through the keys of the h
hash, and print each one
grapes
avocado
apple
lemon
banana
mango
papaya
This variation prints the output sorted first by frequency $h{$a} <=> $h{$b}
and then alphabetically $a cmp $b
perl -le 'for (@ARGV){ $h{$_}++ }; for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" }' $variable
1 banana
1 grapes
1 mango
2 apple
2 avocado
2 lemon
2 papaya
This variation produces the same output as the last one.
However, instead of an input shell variable, uses an input file 'fruits', with one fruit per line:
perl -lne '$h{$_}++; END{ for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" } }' fruits
精彩评论