finding unique values in a data file

2023-03-25 05:02 问答作者：

I can do this in python but I was wondering if I could do this in Linux

I have a file like this

name1 text text 123432re text
name2 text text 12344qp开发者_开发技巧 text
name3 text text 134234ts text

I want to find all the different types of values in the 3rd column by a particular username lets say name 1.

grep name1 filename gives me all the lines, but there must be some way to just list all the different type of values? (I don't want to display duplicate values for the same username)

grep name1 filename | cut -d ' ' -f 4 | sort -u

This will find all lines that have name1, then get just the fourth column of data and show only unique values.

I tried using cat

File contains :(here file is foo.sh you can input any file name here)

$cat foo.sh

tar
world
class
zip
zip
zip
python
jin
jin
doo
doo

uniq will get each word only once

$ cat foo.sh | sort | uniq

class
doo
jin
python
tar
world
zip

uniq -u will get the word appeared only one time in file

$ cat foo.sh | sort | uniq -u

class
python
tar
world

uniq -d will get the only the duplicate words and print them only once

$ cat foo.sh | sort | uniq -d

doo
jin
zip

You can let sort look only on 4-th key, and then ask only for records with unique keys:

grep name1 | sort -k4 -u

As an all-in-one awk solution:

awk '$1 == "name1" && ! seen[$1" "$4]++ {print $4}' filename

IMHO Michał Šrajer got the best answer but a filename needed after grep name1 And i've got this fancy solution using indexed array

user=name1

IFSOLD=$IFS; IFS=$'\n'; test=( $(grep $user test) ); IFS=$IFSOLD
declare -A index
for item in "${test[@]}"; {
    sub=( $item )
    name=${sub[3]}
    index[$name]=$item
}

for item in "${index[@]}"; { echo $item; }

In my opinion, you need to select the field from which you need the unique values. I was trying to retrieve unique source IPs from IPTables log.

cat /var/log/iptables.log | grep "May  5" | awk '{print $11}' | sort -u

Here is the output of the above command:

SRC=192.168.10.225

SRC=192.168.10.29

SRC=192.168.20.125

SRC=192.168.20.147

SRC=192.168.20.155

SRC=192.168.20.183

SRC=192.168.20.194

So, the best idea is to select the field first and then filter out the unique data.

The following command worked for me.

sudo cat AirtelFeb.txt | awk '{print $3}' | sort -u

Here it prints the 3rd column with unique values.

I think you meant fourth column. You can try using 'cat Filename.txt | awk '{print $4}' | sort | uniq'

继续阅读：bash shell

finding unique values in a data file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？