Compare columns of unequal length for matches and differences

2023-02-05 23:32 问答作者：

I will explain this in excel terms just so it will probably be clearer.

I have an excel sheet with 2 columns.

Column A has 69,000 rows. Column B has 49,000 rows.

Column A has our complete product list Column B has product list from Manufacturer 1

There are only certain/some rows which开发者_开发知识库 are common between 2 columns. and also, column B is not a subset of column A. Column A has extra entries and so does column B.

I need to know, which rows from Column B, are common with Column A which rows from Column B are not common with Column A

How would I acheive this? I am trying excel but the vlookup is taking forever and hanging up. Are there any other windows/office utilities which can help me? If its a macro, can you please give me scripts and suggestions to execute it?

I have access to linux machine aswell and I am familiar with those tools.

I can transfer this info to a text file/s, can I run some sed or awk script to print the output?

Any help would be great.

Use the MATCH() function, it'll give you a number if there is a result, and #NA if there isn't.

I always work in Tables in Excel 2007 and newer, but will give both syntaxes:

Assuming you have a table, with things to compare in columns "column1" and "column2", checking whether the value in Column2 is present in Column1

=ISNUMBER(MATCH(Table1[[#This Row],[Column2]],[Column1],0))

Or if you have an old school array with data in columns A and B, looking for the value in B in A:

=ISNUMBER(MATCH(Sheet1!$B2,Sheet1!$A$2:$A$11,0))

What's going on - you are looking for an exact match (the 0 parameter), of the value on the current row in one column, in the other column and checking whether you get a numeric value (yes there is a match), or not (no match)

This is dead simple on Unix or Linux. Start by putting all of your company's products in one file, and all of the other company's products in another. I'll call them FileA and FileB.

Sort them.

$ sort -u FileA > temp_file
$ mv temp_file FileA

$ sort -u FileB > temp_file
$ mv temp_file FileB

The products that are common to both files . . .

$ comm -12 FileA FileB

The products that are unique to FileB . . .

$ comm -13 FileA FileB

I'm surprised the VLOOKUP is slow/unreliable, 70'000 rows is nothing. Are you sure you've got the formulas correct?

Seeing as you have Excel, you might have MS Access. Loading the columns into an Access table and resolving with JOINs would be very quick

Sort the two lists and use Approximate Vlookup (last argument True): this will be extremely fast (binary search) but you need to handle the NoMatch case: something like this in column C
=IF(B1=Vlookup(B1,$A$1:$A$69000,1,True),"Match","NoMatch")
and copy down

继续阅读：compare excel

Compare columns of unequal length for matches and differences

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？