Cartesian product of two files (as sets of lines) in GNU/Linux

2022-12-09 04:23 问答作者：

How can I use shell one-liners and common GNU tools to concatenate lines in two files as in Cartesian product? What is the most succinct, beautiful and "linuxy" way?

For example, if I 开发者_StackOverflowhave two files:

$ cat file1
a
b
$ cat file2
c
d
e

The result should be

a, c
a, d
a, e
b, c
b, d
b, e

Here's shell script to do it

while read a; do while read b; do echo "$a, $b"; done < file2; done < file1

Though that will be quite slow. I can't think of any precompiled logic to accomplish this. The next step for speed would be to do the above in awk/perl.

awk 'NR==FNR { a[$0]; next } { for (i in a) print i",", $0 }' file1 file2

Hmm, how about this hacky solution to use precompiled logic?

paste -d, <(sed -n "$(yes 'p;' | head -n $(wc -l < file2))" file1) \
          <(cat $(yes 'file2' | head -n $(wc -l < file1)))

There won't be a comma to separate but using only join:

$ join -j 2 file1 file2
 a c
 a d
 a e
 b c
 b d
 b e

The mechanical way to do it in shell, not using Perl or Python, is:

while read line1
do
    while read line2
    do echo "$line1, $line2"
    done < file2
done < file1

The join command can sometimes be used for these operations - however, I'm not clear that it can do cartesian product as a degenerate case.

One step up from the double loop would be:

while read line1
do
    sed "s/^/$line1, /" file2
done < file1

I'm not going to pretend this is pretty, but...

join -t, -j 9999 -o 2.1,1.1 /tmp/file1 /tmp/file2

(updated thanks to Iwan Aucamp below)

-- join (GNU coreutils) 8.4

Edit:

DVK's attempt inspired me to do this with eval:

script='1{x;d};${H;x;s/\n/\,/g;p;q};H'
eval "echo {$(sed -n $script file1)}\,\ {$(sed -n $script file2)}$'\n'"|sed 's/^ //'

Or a simpler sed script:

script=':a;N;${s/\n/,/g;b};ba'

which you would use without the -n switch.

which gives:

a, c
a, d
a, e
b, c
b, d
b, e

Original answer:

In Bash, you can do this. It doesn't read from files, but it's a neat trick:

$ echo {a,b}\,\ {c,d,e}$'\n'
a, c
 a, d
 a, e
 b, c
 b, d
 b, e

More simply:

$ echo {a,b}{c,d,e}
ac ad ae bc bd be

a generic recursive BASH function could be something like this:

foreachline() {

    _foreachline() {

        if [ $#  -lt 2 ]; then
            printf "$1\n"
            return
        fi

        local prefix=$1
        local file=$2
        shift 2

        while read line; do
            _foreachline "$prefix$line, " $*
        done <$file
    }

    _foreachline "" $*
}

foreachline file1 file2 file3

Regards.

Solution 1:

perl -e '{use File::Slurp; @f1 = read_file("file1"); @f2 = read_file("file2"); map { chomp; $v1 = $_; map { print "$v1,$_"; } @f2 } @f1;}'

Edit: Oops... Sorry, I thought this was tagged python...

If you have python 2.6:

from itertools import product
print('\n'.join((', '.join(elt) for elt in (product(*((line.strip() for line in fh) for fh in (open('file1','r'), open('file2','r'))))))))

a, c
a, d
a, e
b, c
b, d
b, e

If you have python pre-2.6:

def product(*args, **kwds):
    '''
    Source: http://docs.python.org/library/itertools.html#itertools.product
    '''
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = map(tuple, args) * kwds.get('repeat', 1)
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)
print('\n'.join((', '.join(elt) for elt in (product(*((line.strip() for line in fh) for fh in (open('file1','r'), open('file2','r'))))))))

A solution using join, awk and process substitution:

join <(xargs -I_ echo 1 _ < setA) <(xargs -I_ echo 1 _ < setB)
  | awk '{ printf("%s, %s\n", $2, $3) }'

awk 'FNR==NR{ a[++d]=$1; next}
{
  for ( i=1;i<=d;i++){
    print $1","a[i]
  }
}' file2 file1

# ./shell.sh
a,c
a,d
a,e
b,c
b,d
b,e

OK, this is derivation of Dennis Williamson's solution above since he noted that his does not read from file:

$ echo {`cat a | tr "\012" ","`}\,\ {`cat b | tr "\012" ","`}$'\n'
a, c
 a, d
 a, e
 b, c
 b, d
 b, e

GNU Parallel:

parallel echo "{1}, {2}" :::: file1 :::: file2

Output:

a, c
a, d
a, e
b, c
b, d
b, e

Of course perl has a module for that:

#!/usr/bin/perl

use File::Slurp;
use Math::Cartesian::Product;

use v5.10;
$, = ", ";

@file1 = read_file("file1", chomp => 1);
@file2 = read_file("file2", chomp => 1);

cartesian { say @_ } \@file1, \@file2;

Output:

a, c
a, d
a, e
b, c
b, d
b, e

In fish it's a one-liner

printf '%s\n' (cat file1)", "(cat file2)

继续阅读：shell

Cartesian product of two files (as sets of lines) in GNU/Linux

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？