开发者

Bitwise AND in Sql Server

I have a very typical situation. We have a table called Users which has a column called Branches (varchar 1000).

The organization can have 1000 branches. So if a user has access to branch 1, 5, and 10, the branches string would look like:

1000100001000000000......

(i.e. 1 for a position a User has branch access to based on the branch's number). Please do not advise better data storage options, this is coming to me from a legacy application that is deployed across continents.

Now given this background (and considering that there can be > 10000 users), I want to search for all Users who have access to any one of given set of branches, e.g. Find all users who have access to either branch 10, 65, 90 or 125.

One easy solution is to convert the desired set of branches (i.e. 10, 65, 90, 125) to a branch string (00000010100 etc), then use a scalar UDF to iterate over both the branch strings and return true at first matching occurence where 2 branch strings have 1, and false if there is not a 1 at common position.

Other than that, I also have an option of searching in application in C#. Some of these users are privileged (approx 1000 or more) and their data is cached in application as it is accessed very frequently. But for other users that are not privileged, data is only in db.

I have 2 questions here: 1) For a db search, is there a better way other than the UDF approach I mentioned. 2) For privileged users, what would be better in terms of performance,开发者_开发知识库 search in application (which further can be based on a for loop on branch strings like in UDF, or as a Linq Intersect operator on 2 branch arrays, i.e. a Linq Intersect on [1,5,9,50,80,200] and [6,90,256,300] etc.) Would a db search produce faster results or an application based search?

Consider there might be other parameters for search in both cases, e.g. Last name starts with.

My current approach is to filter rows in db for both situations first on other parameters (like Last name starts with). Then use a scalar UDF to filter this result-set based on branches and then return the results.


Do it in SQL, it will be only 100 times faster than doing it in C# or other front end.

Use the built-in numbers table to break the long string into positions (number series goes up to 2047).

Sample tables

create table users (userid int)
insert users select 1 union all select 2

create table permission (userid int, bigstr varchar(1000))
insert permission
select 1, REPLICATE('0', 56) + '1' -- 57th
        + REPLICATE('0', 32) + '1' -- 90th
        + REPLICATE('0', 64) + '1' -- 155th
        + REPLICATE('0', 845)
insert permission
select 2, REPLICATE('0', 66) + '1' -- 67th
        + REPLICATE('0', 98) + '1' -- 166th
        + REPLICATE('0', 657) + '1' -- 824th
        + REPLICATE('0', 176)

Sample showing all the matching permissions against a list

select *
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
  and SUBSTRING(p.bigstr,v.number,1) = '1'
  and v.number between 1 and LEN(p.bigstr)  -- or 1000 if it is always 1000
where v.number in (57,90,824)

To find users who have at access to at least one branch in the list:

select distinct u.userid
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
  and SUBSTRING(p.bigstr,v.number,1) = '1'
  and v.number between 1 and LEN(p.bigstr)  -- or 1000 if it is always 1000
where v.number in (57,90,824)

etc..


You might want to construct strings of the form __1_1____1% for a LIKE query to find all users who have access to branches 3, 5, and 10.

To construct these strings the easiest way is to start with a string of _ characters that's as long as the largest branch number in your set (or larger), and then replace individual _ characters with 1 characters, and then append the % at the end.

As for whether this is faster than doing a loop in the database or a loop in your application, I think your best approach is to just test it.


Bitwise Group Membership:

From the comments, I'm assuming that we are not able to use a link table for group memberships. Here is a bitwise solution that does not use strings. It cannot be an acceptable answer because the number of bits limits the number of groups quite severely. However, by using integers with explicit value comparisons, the database can make use of its indexes efficiently. So I've added it for the case where the number of groups / roles / whatever is limited enough to fit.
PS: Excuse an binary-decimal mess ups, I just plugged stuff in on the fly. Feel free to comment and correct, if I have any errors.

Each group is assigned a bit:

G1: 0001
G2: 0010
G3: 0100
G4: 1000

Users' group memberships are calculated with bitwise &. Here are some examples with the binary and decimal equivalents:

U1: G1:             0001 (01)
U2: G2:             0010 (02)
U3: G3:             0100 (04)
U4: G4:             1000 (08)
U5: G1 & G2:        0011 (03)
U6: G2 & G3:        0110 (06)
U7: G1 & G3:        0101 (05)
U8: G2 & G4:        1010 (10)
U9: G1 & G2 & G4:   1011 (11)

Now, calculate, using iteration from 1-N (N is number of groups) and get a list of all the possible integer values that any particular group can contribute to. For example, G1 will be present in any odd number:

G1' : 0001 (01), 0011 (03), 0101 (05), 0111 (07), 1001 (09), 1011 (11), 1101 (13), 1111 (15)
G2' : 0010 (02), 0011 (03), 0110 (06), 0111 (07), 1010 (10), 1011 (11), 1110 (14), 1111 (15)
G3' : 0100 (04), 0101 (05), 0110 (06), 0111 (07), 1100 (12), 1101 (13), 1110 (14), 1111 (15)
G4' : 1000 (08), 1001 (09), 1010 (10), 1011 (11), 1100 (12), 1101 (13), 1110 (14), 1111 (15)

You can do this with a loop from 1-1000 with a bitwise AND of the group's decimal value 1,2,4,8, etc.
Keep the values in memory, or push them into the table storing your groups' possible memerships e.g. possible_memberships.

Get me users in G1:
   Q: select * from users where group_memberships in (1, 3, 5, 7, 9, 11, 13, 15);
   A: U1, U5, U7, U9

Get me users in G2:
   Q: select * from users where group_memberships in (2, 3, 6, 7, 10, 11, 14, 15);
   A: U2, U5, U6, U8, U9

If you have a groups table with column 'possible_memberships', you can put the values in there, saving you from having to send all the values over wire and allowing the subselect to be cached on the database. p>

Get me users in G3:
   Q: select * from users where group_memberships in (select possible_memberships from groups where name = 'G3');
   A: U3, U7, U6


Use a LIKE query. In Sql Server, an _ used in a LIKE expression matches any single character. To get those users that are in branches 1,5, and 10 , you could to it like this:

SELECT columns FROM Users WHERE BRANCHES LIKE '1___1____1%'

This isn't particularly efficient (it's not very sargable), but it should work and it's likely no worse than your udf option.


UPDATE

This is not a feasible solution with a 1000 bit number. I will leave it up incase someone with a smaller number of options comes across this post.


Can you make any changes to the DB schema at all? If you can add a calculated column that holds an integer representation of the binary number you have in your varchar then you can use bitwise logic to select what you are talking about entirely in the DB and pretty quickly.

Here is an example of what I am talking about:

with temp as
(
   select 1 as BranchNumber -- 1
   union 
   select 2 -- 01
   union
   select 5 -- 101
   union
   select 7 -- 111
   union
   select 15 as number -- 111
)

--Select users that belong to branch 2    
    SELECT * from temp
    where (BranchNumber & 2) = 2

--returns 2,7,15



--Select users that belong to at least branches 1,2 and 3    
    SELECT * from temp
    where (BranchNumber & 7) = 7

--returns 7,15

To convert binary to a number you would probably have to create a UDF that you could call to populate the new column. I did a little poking around and found this which appears to be a good starting point. I have not tested it yet, so be careful:

SET NOCOUNT ON
CREATE TABLE #nums (pos bigint)
DECLARE @cntr int
SET @cntr = 0
WHILE @cntr < 63 BEGIN
   INSERT INTO #nums VALUES (@cntr)
   SET @cntr = @cntr + 1
END

DECLARE @binstring varchar(63)
SET @binstring = '10000010000000001000011000000000'

SELECT
   IntegerVal =
      sum(power(convert(bigint,2),pos)
      * substring(reverse(@binstring),pos+1,1)) -- yeah, implicit conversion
   FROM #nums

DROP TABLE #nums


The bitwise query would probably outperform string functions though the 1000 bit integer doesn't exist. However as it can be derived from the string you can choose to break it into a particular number of integer sets and query across them. You will simply need to understand the significant bits for each column and adjust the input by a particular constant appropriately or just whack it into a set of bit representing strings and convert to int.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜