finding unique values in a data file
I can do this in python but I was wondering if I could do this in Linux
I have a file like this
name1 text text 123432re text
name2 text text 12344qp开发者_开发技巧 text
name3 text text 134234ts text
I want to find all the different types of values in the 3rd column by a particular username lets say name 1.
grep name1 filename gives me all the lines, but there must be some way to just list all the different type of values? (I don't want to display duplicate values for the same username)
grep name1 filename | cut -d ' ' -f 4 | sort -u
This will find all lines that have name1, then get just the fourth column of data and show only unique values.
I tried using cat
File contains :(here file is foo.sh you can input any file name here)
$cat foo.sh
tar
world
class
zip
zip
zip
python
jin
jin
doo
doo
uniq
will get each word only once
$ cat foo.sh | sort | uniq
class
doo
jin
python
tar
world
zip
uniq -u
will get the word appeared only one time in file
$ cat foo.sh | sort | uniq -u
class
python
tar
world
uniq -d
will get the only the duplicate words and print them only once
$ cat foo.sh | sort | uniq -d
doo
jin
zip
You can let sort look only on 4-th key, and then ask only for records with unique keys:
grep name1 | sort -k4 -u
As an all-in-one awk solution:
awk '$1 == "name1" && ! seen[$1" "$4]++ {print $4}' filename
IMHO Michał Šrajer got the best answer but a filename needed after grep name1 And i've got this fancy solution using indexed array
user=name1
IFSOLD=$IFS; IFS=$'\n'; test=( $(grep $user test) ); IFS=$IFSOLD
declare -A index
for item in "${test[@]}"; {
sub=( $item )
name=${sub[3]}
index[$name]=$item
}
for item in "${index[@]}"; { echo $item; }
In my opinion, you need to select the field from which you need the unique values. I was trying to retrieve unique source IPs from IPTables log.
cat /var/log/iptables.log | grep "May 5" | awk '{print $11}' | sort -u
Here is the output of the above command:
SRC=192.168.10.225
SRC=192.168.10.29
SRC=192.168.20.125
SRC=192.168.20.147
SRC=192.168.20.155
SRC=192.168.20.183
SRC=192.168.20.194
So, the best idea is to select the field first and then filter out the unique data.
The following command worked for me.
sudo cat AirtelFeb.txt | awk '{print $3}' | sort -u
Here it prints the 3rd column with unique values.
I think you meant fourth column. You can try using 'cat Filename.txt | awk '{print $4}' | sort | uniq'
精彩评论