开发者

Simple SQL: How to calculate unique, contiguous numbers for duplicates in a set?

Let's say I create a table with an int Page, int Section, and an int ID identity field, where the page field ranges from 1 to 8 and the section field ranges from 1 to 30 for each page. Now let's say that two records have duplicate page and section. How could I renumber those two records so that the sequence of page and section numbering is contiguous?

select page, section
from #fun
group by page, section having count(*) > 1

shows the duplicates:

page 1 section 3
page 2 section 3

page 1 section 4 and page 2 section 4 are missing. Is there a way without using a cursor to find and renumber the positions in SQL 2000 that doesn't support Row_Number()?

This rownum below of course produces exactly the same number as in开发者_高级运维 section:

select page, section,
    (select count(*) + 1 
     from #fun b 
     where b.page = a.page and b.section < a.section) as rownum
from #fun a

I could create a pivot table having values 1 through 100, but what would I join against?

What I want to do is something like this:

update p set section = (expression that gets 4)
from #fun p
where (expression that identifies duplicate sections by page)


I don't have a 2000 server to test this on, but I think it should work.

Create test tables/data:

CREATE TABLE #fun
(Id INT IDENTITY(100,1)
,page INT NOT NULL
,section INT NOT NULL
)


INSERT #fun (page, section)
SELECT 1,1
UNION ALL SELECT 1,3    UNION ALL SELECT 1,2
UNION ALL SELECT 1,3    UNION ALL SELECT 1,5
UNION ALL SELECT 2,1    UNION ALL SELECT 2,2
UNION ALL SELECT 2,3    UNION ALL SELECT 2,5
UNION ALL SELECT 2,3

Now the processing:

-- create a worktable
CREATE TABLE #fun2
(Id INT IDENTITY(1,1)
,funId INT
,page INT NOT NULL
,section INT NOT NULL
)

-- insert data into the second temp table ordered by the relevant columns
-- the identity column will form the basis of the revised section number
INSERT  #fun2 (funId, page, section)
SELECT  Id,page,section
FROM    #fun
ORDER BY page,section,Id

-- write the calculated section value back where it is different
UPDATE  p
SET     section = y.calc_section
FROM    #fun AS p 
JOIN
        (
            SELECT  f2.funId, f2.id - x.adjust calc_section
            FROM    #fun2 AS f2
            JOIN    (
                        -- this subquery is used to calculate an offset like
                        -- PARTITION BY in a 2005+ ROWNUMBER function
                        SELECT MIN(Id) - 1 adjust, page
                        FROM #fun2
                        GROUP BY page
                    ) AS x
            ON      f2.page = x.page
        ) AS y
ON      p.Id = y.funId
WHERE   p.section <> y.calc_section


SELECT * FROM #fun order by page, section


Disclaimer: I don't have SQL Server to test.

If I understand you correctly, if you knew the ROW_NUMBER of your #fun records partitioned over (page, section) duplicates, you could use this relative ranking to increment the "section":

    UPDATE p
       SET section = section + (rownumber - 1)
      FROM #fun AS p
INNER JOIN ( -- SELECT id, ROW_NUMBER() OVER (PARTITION BY page, section) ...
            SELECT id, COUNT(1) AS rownumber
              FROM #fun a
         LEFT JOIN #fun b
                   ON a.page = b.page AND a.section = b.section AND a.id <= b.id
          GROUP BY a.id, a.page, a.section) d
            ON p.id = d.id
      WHERE rownumber > 1

That won't handle the case where the number of duplicates push you past your upper limit of 30. It may also create new duplicates where if higher numbered sections per page already exist -- that is, one instance of (pg 1, sec 3) becomes (pg 1, sec 4), which already existed -- but you can run the UPDATE repeatedly until no duplicates exist.

And then add a unique index on (page, section).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜