开发者

Dissecting Number Ranges

I'm trying to figure out the most efficient way to generate a WHERE query. I asked another question earlier, which was similar, but I will get right to the point on this one.

Given a collection of number ranges, ie 1-1000, 1500-1600 it is quite simple to create a mysql where condition to select records which are between these values.

ie, you would just do:

WHERE (lft BETWEEN 1 and 1000) OR (lft BETWEEN 1500-1600). However, what if you wanted to incorporate a NOT BETWEEN as well.

For example, if you define several rules, like...

  • ALLOW BETWEEN 1 - 1000
  • ALLOW BETWEEN 1500 - 1600
  • ALLOW BETWEEN 1250 - 1300
  • DENY BETWEEN 25 - 50

How can I merge these rules in order to efficiently generate a WHERE condition. I would like the WHERE to dissect the ALLOW BETWEEN 1 - 1000 in order to create a gap in it. So that it would become 1-24 and 51-1000. Because the DENY rule is defined after the first rule, it "overwrites" the previous rules.

As another example, Say that you have

  • ALLOW BETWEEN 5 - 15
  • DENY BETWEEN 10 - 50
  • ALLOW BETWEEN 45 - 60

Then I would like to generate a WHERE condition which would allow me to do:

WHERE (lft BETWEEN 5 and 9) OR 开发者_JAVA技巧(lft BETWEEN 45 and 60).

Notes (Edits)

  • Also, the maximum range that would ever allowed is 1 - 5600000. (Which would be 'Earth') ie. Allow everything on Earth.
  • The number ranges are actually the LEFT values in a NESTED SET MODEL. These aren't unique keys. You can read why I want to do this in this question I asked earlier. https://stackoverflow.com/questions/6020553/generating-a-mysql-between-where-condition-based-on-an-access-ruleset
  • Possible important note on my number ranges I maybe shouldn't have used the sample example which I did, but one important note about the nature of the number ranges is that, the ranges should actually always entirely consume or be consumed by a previous rule. For example, I used the example above, 10-50 allow, and deny 45-60. This wouldn't actually ever happen in my data set. It would actually be, allow 10-50, then the DENY would have to either be entirely consumed by that range, ie, 34-38. OR, entirely consume the previous rule. 9-51. This is because the ranges actually represent lft and rgt values in a nested set model and you cannot have overlaps like I presented.

I didn't think to mention that when asking the question, but after seeing the working sample code below, I can see that this note is actually important.

(Edited example mysql to include OR instead of AND as per comment below)


Honestly, why bother? As long as the key you're querying against is indexed, just put the multiple queries in there:

WHERE (foo BETWEEN 1 AND 1000 
        OR foo BETWEEN 1500 AND 1600
        OR foo BETWEEN 1250 AND 1300
    ) AND (
        foo NOT BETWEEN 25 AND 50
    )

You could squeze a slight bit of efficiency by building a dissector, but I would question if it's worth it. All the WHERE clause items would be off of an index, so you're not preventing any hard operation from occurring (meaning you're not stopping a full-table-scan by doing it).

So rather than spending time building a system to do it for you, just implement an easy solution (ORing together the Allows, and ANDing together the Denys) and move on to more important things. Then if it becomes a problem later, revisit it then. But I really don't think this will ever become too big of a problem...

Edit Ok, here's a very simple algorithm for doing this. It uses strings as the data store, so it's reasonably efficient for smaller numbers (below 1 million):

class Dissector {
    protected $range = '';
    public function allow($low, $high) {
        $this->replaceWith($low, $high, '1');
    }
    public function deny($low, $high) {
        $this->replaceWith($low, $high, '0');
    }
    public function findRanges() {
        $matches = array();
        preg_match_all(
            '/(?<!1)1+(?!1)/', 
            $this->range, 
            $matches, 
            PREG_OFFSET_CAPTURE
        );
        return $this->decodeRanges($matches[0]);
    }
    public function generateSql($field) {
        $ranges = $this->findRanges();
        $where = array();
        foreach ($ranges as $range) {
            $where[] = sprintf(
                '%s BETWEEN %d AND %d', 
                $field, 
                $range['from'], 
                $range['to']
            );
        }
        return implode(' OR ', $where);
    }
    protected function decodeRanges(array $matches) {
        $range = array();
        foreach ($matches as $match) {
            $range[] = array(
                'from' => $match[1] + 1, 
                'to' => ($match[1] + strlen($match[0]))
            );
        }
        return $range;
    }
    protected function normalizeLengthTo($size) {
        if (strlen($this->range) < $size) {
            $this->range = str_pad($this->range, $size, '0');
        }
    }
    protected function replaceWith($low, $high, $character) {
        $this->normalizeLengthTo($high);
        $length = $high - $low + 1;
        $stub = str_repeat($character, $length);
        $this->range = substr_replace($this->range, $stub, $low - 1, $length);
    }
}

Usage:

$d = new Dissector();
$d->allow(1, 10);
$d->deny(5, 15);
$d->allow(10, 20);
var_dump($d->findRanges());
var_dump($d->generateSql('foo'));

Generates:

array(2) {
  [0]=>
  array(2) {
    ["from"]=>
    int(1)
    ["to"]=>
    int(4)
  }
  [1]=>
  array(2) {
    ["from"]=>
    int(10)
    ["to"]=>
    int(20)
  }
}
string(44) "foo BETWEEN 1 AND 4 OR foo BETWEEN 10 AND 20"


I spent a little time trying to solve this (it's a neat problem), and came up with this. It's not optimal, nor am I guaranteeing it's perfect, but it might get you started:

<?php

/*$cond = array(
    array('a', 5, 15),
    array('d', 9, 50),
    array('a', 45, 60)
);*/

$cond = array(
    array('a', 1, 1000),
    array('a', 1500, 1600),
    array('a', 1250, 1300),
    array('d', 25, 50)
);

$allow = array();

function merge_and_sort(&$allow)
{
    usort($allow, function($arr1, $arr2)
    {
        if ($arr1[0] > $arr2[0])
        {
            return 1;
        }
        else
        {
            return -1;
        }
    });

    $prev = false;

    for ($i = 0; $i < count($allow); $i++)
    {
        $c = $allow[$i];
        if ($i > 0 && $allow[$i][0] < $allow[$i - 1][1])
        {
            if ($allow[$i][1] <= $allow[$i - 1][1])
            {
                unset($allow[$i]);
            }
            else
            {
                $allow[$i - 1][1] = $allow[$i][1];
                unset($allow[$i]);
            }
        }
    }

    usort($allow, function($arr1, $arr2)
    {
        if ($arr1[0] > $arr2[0])
        {
            return 1;
        }
        else
        {
            return -1;
        }
    });
}

function remove_cond(&$allow, $start, $end)
{
    for ($i = 0; $i < count($allow); $i++)
    {
        if ($start > $allow[$i][0])
        {
            if ($end <= $allow[$i][1])
            {
                $temp = $allow[$i][1];
                $allow[$i][1] = $start;
                $allow []= array($end, $temp);
            }
            else
            {
                $found = false;
                for ($j = $i + 1; $j < count($allow); $j++)
                {
                    if ($end >= $allow[$j][0] && $end < $allow[$j][1])
                    {
                        $found = true;
                        $allow[$j][0] = $end;
                    }
                    else
                    {
                        unset($allow[$j]);
                    }
                }

                if (!$found)
                {
                    $allow[$i][1] = $start;
                }
            }
        }
    }
}

foreach ($cond as $c)
{
    if ($c[0] == "a")
    {
        $allow []= array($c[1], $c[2]);

        merge_and_sort($allow);
    }
    else
    {
        remove_cond($allow, $c[1], $c[2]);
        merge_and_sort($allow);
    }
}

var_dump($allow);

The last var_dump outputs:

array(4) {
  [0]=>
  array(2) {
    [0]=>
    int(1)
    [1]=>
    int(25)
  }
  [1]=>
  array(2) {
    [0]=>
    int(50)
    [1]=>
    int(1000)
  }
  [2]=>
  array(2) {
    [0]=>
    int(1250)
    [1]=>
    int(1300)
  }
  [3]=>
  array(2) {
    [0]=>
    int(1500)
    [1]=>
    int(1600)
  }
}

Edited to use the first example instead of the second.


I would process the instructions one at a time creating a list of numbers that should be included. Then finally translating that list into a set of ranges for the where clause. Here is some pseudo code:

$numbers = array();
foreach (conditions as $condition) {
    if ($condition is include) {
        for ($i = $condition.start; $i <= $condition.end; $i++) {
            $numbers[$i] = true;
        }
    } else {
        for ($i = $condition.start; $i <= $condition.end; $i++) {
            unset($numbers[$i]);
        }
    }
}
ksort($numbers);


I asked on IRC and received two responses. I'm going to post them both so that other people might benefit (and so that I don't lose them since I will take a look in detail at them both shortly).

Example 1 TML

<pre><?php

$cond = array(
    array('a', 5, 15),
    array('a', 5, 15),
    array('d', 9, 50),
    array('a', 45, 60),
    array('a', 2, 70),
    array('d', 1, 150),
);



function buildAcl($set) {
    $allow = array();
    foreach($set as $acl) {
        $range = range($acl[1], $acl[2]);
        switch($acl[0]) {
            case 'a':
                $allow = array_unique(array_merge(array_values($allow), $range));
                break;
            case 'd':
                foreach($range as $entry) {
                    unset($allow[array_search($entry, $allow)]);
                }
        }
    }
    return $allow;
}

var_dump(buildAcl($cond));
var_dump(buildAcl(array(array('a', 5, 15), array('d', 10, 50), array('a', 45, 60))));

Example 2 (matslin)

<?php
$conds = array(
    array('a', 5, 15),
    array('a', 5, 15),
    array('d', 9, 50),
    array('a', 45, 60),
    array('a', 2, 70),
    array('d', 1, 150),
);

$segments = array();

foreach($conds as $cond)
{
    print($cond[0] . ': ' . $cond[1] . ' - ' . $cond[2] . "\n");
    if ($cond[0] == 'a')
    {
        $new_segments = array();
        $inserted = false;
        $prev_segment = false;

        foreach($segments as $segment)
        {
            if ($segment['begin'] > $cond[2])
            {
                $new_segments[] = array('begin' => $cond[1], 'end' => $cond[2]);
                $new_segments[] = $segment;
                $inserted = true;
                print("begun\n");
                continue;
            }

            if ($segment['end'] < $cond[1])
            {
                print("end\n");
                $new_segments[] = $segment;
                continue;
            }

            if ($cond[1] < $segment['begin'])
            {
                $segment['begin'] = $cond[1];
            }

            if ($cond[2] > $segment['end'])
            {
                $segment['end'] = $cond[2];
            }

            $inserted = true;

            if (
                $prev_segment &&
                ($prev_segment['begin'] <= $segment['begin']) &&
                ($prev_segment['end'] >= $segment['end'])
            )
            {
                print("ignore identical\n");
                continue;
            }

            print("default\n");
            $prev_segment = $segment;
            $new_segments[] = $segment;
        }

        if (!$inserted)
        {
            print("inserted at end\n");
            $new_segments[] = array('begin' => $cond[1], 'end' => $cond[2]);
        }

        $segments = $new_segments;
        print("---\n");
    }

    if ($cond[0] == 'd')
    {
        $new_segments = array();

        foreach($segments as $segment)
        {
            # not contained in segment
            if ($segment['begin'] > $cond[2])
            {
                print("delete segment is in front\n");
                $new_segments[] = $segment;
                continue;
            }

            if ($segment['end'] < $cond[1])
            {
                print("delete segment is behind\n");
                $new_segments[] = $segment;
                continue;
            }

            # delete whole segment
            if (
                ($segment['begin'] >= $cond[1]) &&
                ($segment['end'] <= $cond[2])
            )
            {
                print("delete whole segment\n");
                continue;
            }

            # delete starts at boundary
            if ($cond[1] <= $segment['begin'])
            {
                print("delete at boundary start\n");
                $segment['begin'] = $cond[2];
                $new_segments[] = $segment;
                continue;
            }
            # delete ends at boundary
            if ($cond[2] >= $segment['end'])
            {
                print("delete at boundary end\n");
                $segment['end'] = $cond[1];
                $new_segments[] = $segment;
                continue;
            }

            # split into two segments
            print("split into two\n");
            $segment_pre = array('begin' => $segment['begin'], 'end' => $cond[1]);
            $segment_post = array('begin' => $cond[2], 'end' => $segment['end']);

            $new_segments[] = $segment_pre;
            $new_segments[] = $segment_post;
        }

        print("--\n");
        $segments = $new_segments;
    }

    print("----\n");
    var_dump($segments);
    print("----\n");
}

var_dump($segments);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜