How to select rows from matrix with unique entry in specific column?

2023-03-31 18:22 问答作者：

I tried to solve this using a functional way, but I am not having much success.

Suppose there is a list of lists, and it is required to pick only the lists in it which have unique entry in specific position.

For example, suppose there is a matrix, and we want to select only the rows which have unique elements in the first column.

Here is an example:

INPUT:

list= {{ 1,2}, {1,3},{4,5}}

I'd like the output to be

list={{1,2},{4,5}}

It does not matter which 'row' to remove, the first one is ok, but any one is fine.

I tried Select, DeleteCases, DeleteDuplicates, Union, and few other things, but can't get it to work. I do not know how to tell Mathematica to only look for 'unique' element. Union comes close but it works on complete list. i.e. I do not know what to write for criteria, as in

DeleteDuplicates[list, <now what?> ]

For reference, this is how I do the above in Matlab:

EDU>> A=[1 2;1 3;4 5开发者_JS百科]

A =
     1     2
     1     3
     4     5

EDU>> [B,I,J]=unique(A(:,1));
EDU>> A(I,:)

ans =
     1     3
     4     5

thanks

Here is one way:

DeleteDuplicates[list, First@#1 === First@#2 &]

EDIT

Note that the timings and discussion below are based on M7

Upon reflecting a bit, I found a solution which will be (at least) order of magnitude faster for large lists, and sometimes two orders of magnitude faster, for this particular case (probably, a better way to put it is that the solution below will have different computational complexity):

Clear[delDupBy];
delDupBy[nested_List, n_Integer] :=
  Module[{parts = nested[[All, n]], ord, unpos},
    ord = Ordering[parts];
    unpos = Most@Accumulate@Prepend[Map[Length, Split@parts[[ord]]], 1];
    nested[[Sort@ord[[unpos]]]]];

Benchmarks:

In[406]:= 
largeList = RandomInteger[{1,15},{50000,2}];

In[407]:= delDupBy[largeList,1]//Timing
Out[407]= {0.016,{{13,4},{12,1},{1,6},{6,13},{10,12},{7,15},{8,14},
            {14,4},{4,1},{11,9},{5,11},{15,4},{2,7},{3,2},{9,12}}}

In[408]:= DeleteDuplicates[largeList,First@#1===First@#2&]//Timing
Out[408]= {1.265,{{13,4},{12,1},{1,6},{6,13},{10,12},{7,15},{8,14},{14,4},
      {4,1},{11,9},{5,11},{15,4},{2,7},{3,2},{9,12}}}

This is particularly remarkable because DeleteDuplicates is a built-in function. I can make a blind guess that DeleteDuplicates with user-defined test is using a quadratic-time pairwise comparison algorithm, while delDupBy is n*log n in the size of the list.

I think this is an important lesson: one should pay attention to built-in functions such as Union, Sort, DeleteDuplicates etc when using custom tests. I discussed it more extensively in this Mathgroup thread, where there are also other insightful replies.

Finally, let me mention that exactly this question has been asked (with the emphasis on efficiency) before here. I will reproduce here a solution I gave for the case when the first (or, generally, n-th) elements are positive integers (generalizing to arbitrary integers is straightforward).:

Clear[sparseArrayElements];
sparseArrayElements[HoldPattern[SparseArray[u___]]] := {u}[[4, 3]]

Clear[deleteDuplicatesBy];
Options[deleteDuplicatesBy] = {Ordered -> True, Threshold -> 1000000};
deleteDuplicatesBy[data_List, n_Integer, opts___?OptionQ] := 
  Module[{fdata = data[[All, n]], parr, 
  rlen = Range[Length[data], 1, -1], 
  preserveOrder =  Ordered /. Flatten[{opts}] /. Options[deleteDuplicatesBy], 
  threshold =  Threshold /. Flatten[{opts}] /. Options[deleteDuplicatesBy], dim},
  dim = Max[fdata];
  parr = If[dim < threshold, Table[0, {dim}], SparseArray[{}, dim, 0]];
  parr[[fdata[[rlen]]]] = rlen;
  parr = sparseArrayElements@If[dim < threshold, SparseArray@parr, parr];
  data[[If[preserveOrder, Sort@parr, parr]]]
];

The way this works is to use the first (or, generally, n-th) elements as positions in some huge table we preallocate, exploiting that they are positive integers). This one can give us crazy performance in some cases. Observe:

In[423]:= hugeList = RandomInteger[{1,1000},{500000,2}];

In[424]:= delDupBy[hugeList,1]//Short//Timing
Out[424]= {0.219,{{153,549},{887,328},{731,825},<<994>>,{986,150},{92,581},{988,147}}}

In[430]:= deleteDuplicatesBy[hugeList,1]//Short//Timing
Out[430]= {0.032,{{153,549},{887,328},{731,825},<<994>>,{986,150},{92,581},{988,147}}}

Leonid provided a long and thorough answer, as he often does. However, I believe it is worth pointing out that an efficient and concise solution may be had with:

First /@ GatherBy[hugeList, #[[1]] &]

Where 1 is the column index to compare.

On my system this is faster than delDupBy but not as fast as deleteDuplicatesBy.

继续阅读：wolfram-mathematica

How to select rows from matrix with unique entry in specific column?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？