How to merge two files using AWK? [duplicate]

2023-02-20 15:07 问答作者：

This question already has answers here: Inner join on two text files (5 answers) Closed 6 years ago.

File 1 has 5 fields A B C D E, with field A is an integer-valued

File 2 has 3 fields A F G

The number of rows in File 1 is much bigger than that of File 2 (20^6 to 5000)

All the entries of A in File 1 appeared in field A in File 2

I like to merge the two files by field A and carry F and G

Desired output is A B C D E F G

Example

File 1

 A     B     C    D    E
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1

File 2

A 开发者_如何学Python    F    G   
4050 12.1 23.6
4049 14.4 47.8   
4048 23.2 43.9
4047 45.5 21.6

Desired output

A    B      C      D   E F    G
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6

$ awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
4046 S00344 31322 4000 1

Explanation: (Partly based on another question. A bit late though.)

FNR refers to the record number (typically the line number) in the current file and NR refers to the total record number. The operator == is a comparison operator, which returns true when the two surrounding operands are equal. So FNR==NR{commands} means that the commands inside the brackets only executed while processing the first file (file2 now).

FS refers to the field separator and $1, $2 etc. are the 1st, 2nd etc. fields in a line. a[$1]=$2 FS $3 means that a dictionary(/array) (named a) is filled with $1 key and $2 FS $3 value.

; separates the commands

next means that any other commands are ignored for the current line. (The processing continues on the next line.)

$0 is the whole line

{print $0, a[$1]} simply prints out the whole line and the value of a[$1] (if $1 is in the dictionary, otherwise only $0 is printed). Now it is only executed for the 2nd file (file1 now), because of FNR==NR{...;next}.

Thankfully, you don't need to write this at all. Unix has a join command to do this for you.

join -1 1 -2 1 File1 File2

Here it is "in action":

will-hartungs-computer:tmp will$ cat f1
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1
will-hartungs-computer:tmp will$ cat f2
4050 12.1 23.6
4049 14.4 47.8   
4048 23.2 43.9
4047 45.5 21.6
will-hartungs-computer:tmp will$ join -1 1 -2 1 f1 f2
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
will-hartungs-computer:tmp will$

You need to read the entries from File 2 into a pair of associative arrays in the BEGIN block. Assuming GNU Awk:

BEGIN { while (getline < "File 2") { f[$1] = $2; g[$1] = $3 } }

In the main processing block, you read the line from File 1 and print it with the correct data from the arrays created in the BEGIN block:

{ print $0, f[$1], g[$1] }

Supply File 1 as the filename argument to the program.

awk 'BEGIN { while (getline < "File 2") { f[$1] = $2; g[$1] = $3 } }
     print $0, f[$1], g[$1] }' "File 1"

The quotes around the file name argument are needed because of the spaces in the file name. You need the quotes around the getline filename even if it contained no spaces as it would otherwise be a variable name.

awk 'BEGIN{OFS=","}  FNR==NR {F[$1]=$2;G[$1]=$3;next} {print $1,$2,$3,$4,$5,F[$1],G[$1]}' file2.txt file1.txt

继续阅读：bash

How to merge two files using AWK? [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？