开发者

Find missing and overlapping numbers in sequences

Let's say we have a data structure like this:

var sequences = new List<Tuple<int, int>>
                {
                    new Tuple<int, int>(1, 10),
                    new Tuple<int, int>(8, 101),
                    new Tuple<int, int>(102, 103),
                    new Tuple<int, int>(104, 104),
                    new Tuple<int, int>(110, 200)
                };

I would like to get two results from this collection:

开发者_如何学JAVA
  • All the missing numbers (in this example: 105, 106, 107, 108, 109)
  • All the overlapping numbers (in this example: 8, 9, 10)

I could write an algorithm with a couple of loops and helper collections. This off course would work, but I wonder if this could be achieved with the help of LINQ and/or other easier and shorter algorithms?

Edit: My data structure from the example above is representing 5 sequences, the first one containing numbers from 1 to 10, the second one containing the numbers from 8 to 101 and so on... Because in the production the sequences can be much bigger (up to millions), they are not represented with the actual collection (for instance with Lists with all the numbers), but with Tuples, which represents the minimal and the maximal number of each sequence.


You could do it via

var missing = 
      Enumerable.Range(1, 200)
               .Where(i => sequences.All(t => t.Item1 > i || t.Item2 < i));
var overlapping = 
      Enumerable.Range(1, 200)
                .Where(i => sequences.Count(t => t.Item1 <= i && t.Item2 >= i) > 1);


I know algorithm for this problem (it is pseudoCode). ( Complexity classes O(nlog(n)) where n is count of tuple)

So solution is sort Tuple by function:

  int comparer( Tuple a, Tuple b) {
      if ( a.first.compareTo(b.first) == 0 ) {
          return a.second.compareTo(b.second);
      } else 
          return a.first.compareTo(b.first);
  }

so example tuple: (1, 10), (1, 5), (2, 8) will be sort to: (1,5), (1, 10), (2, 8).

Next step is accumulate this result. Iterate on this result and :

 Tuple result = SortedList[0];
 foreach ( Tuple tuple in SortedList ) {

     if ( result.second < tuple.first ) {

        // here you have missing number (result.second, tuple.first)

        result.first = tuple.first; 
        result.second = tuple.second
     } else if ( result.second > tuple.first ) {

        // here you have overlapping number (tuple.first, min( result.second,tuple.second ))

        if ( result.second < tuple.second ) {
              result.second = tuple.second;
        }
     } else {
        result.second = tuple.second;
     }

 }

What we know, that if will be iterate next tuple first number is bigger or equals than result.first. Commentate in code tell you where you have overlapping and missing number


try this

var expandedSequences = sequences.Select(t => Enumerable.Range(t.Item1, t.Item2-t.Item1)).SelectMany(t => t).OrderBy(i => i);
var dupes = expandedSequences.GroupBy(i => i).Where(g => g.Count() > 1).Select(g => g.Key);
var missing = Enumerable.Range(expandedSequences.Min(), expandedSequences.Max()).Except(expandedSequences);


In one pass:

var sequences = new List<Tuple<int, int>>
    {
        new Tuple<int, int>(1, 10),
        new Tuple<int, int>(8, 101),
        new Tuple<int, int>(102, 103),
        new Tuple<int, int>(104, 104),
        new Tuple<int, int>(110, 200)
    };
var missing = new List<int>();
var overlap = new List<int>();

sequences.Aggregate((prev, current) => {
    if (prev.Item2 >= current.Item1) {
        overlap.AddRange(Enumerable.Range(current.Item1, prev.Item2 - current.Item1 + 1));
    }
    if (current.Item1 > prev.Item2 + 1) {
        missing.AddRange(Enumerable.Range(prev.Item2 + 1, current.Item1 - prev.Item2 - 1));
    }
    return current;
});


There's a few edge cases I can only assume how you wish to handle. I've chosen not to handle one of them (commented in the code). Since you've given no indication of how you would want to represent missing/ovrelapping sequences I've chosen your own format using tuples to identify the start and end of the sequence.

//Assumes they are sorted on item1
        Tuple<IEnumerable<Tuple<int,int>>,IEnumerable<Tuple<int,int>>> FindMissingAndOverLapping(IEnumerable<Tuple<int,int>> sequences){
            var previous = Tuple.Create(0, 0);
            var missing = new List<Tuple<int,int>>();
            var overlapping = new List<Tuple<int, int>>();
            var max = 0;
            foreach (var sequence in sequences){
                var end = previous.Item2;
                max = end > max ? end : max;
                if (previous.Item2 < sequence.Item1 + 1){
                    missing.Add(Tuple.Create(previous.Item2 + 1, sequence.Item1 - 1));
                } else if (max < sequence.Item1){
                    overlapping.Add(Tuple.Create(sequence.Item1, max));
                }
            }
            //The sequences in ovrelapping can be ovrelapping them self
            return new Tuple<IEnumerable<Tuple<int,int>>,IEnumerable<Tuple<int,int>>>(missing, overlapping);
        }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜