SELECT INTO behavior and the IDENTITY property
I've been working on a project and came across some interesting behavior when using SELECT INTO. If I have a table with a column defined as int identity(1,1) not null
and use SELECT INTO to copy it, the new table will retain the IDENTITY property unless there is a join involved. If there is a join, then the same column on the new table is defined simply as int not null
.
Here is a script that you can run to reproduce the behavior:
CREATE TABLE People (Id INT IDENTITY(1,1) not null, Name VARCHAR(10))
CREATE TABLE ReverseNames (Name varchar(10), ReverseName varchar(10))
INSERT INTO People (Name)
VALUES ('John'), ('Jamie'), ('Joe'), ('Jenna')
INSERT INTO ReverseNames (Name, ReverseName)
VALUES ('John','nhoJ'), ('Jamie','eimaJ'), ('Joe','eoJ'), ('Jenna','anneJ')
--------
SELECT Id, Name
INTO People_ExactCopy
FROM People
SELECT Id, ReverseName as Name
INTO People_WithJoin
FROM People
JOIN ReverseNames
ON People.Name = ReverseNames.Name
SELECT Id, (SELEC开发者_如何学编程T ReverseName FROM ReverseNames WHERE Name = People.Name) as Name
INTO People_WithSubSelect
FROM People
--------
SELECT OBJECT_NAME(c.object_id) as [Table],
c.is_identity as [Id Column Retained Identity]
FROM sys.columns c
where
OBJECT_NAME(c.object_id) IN ('People_ExactCopy','People_WithJoin','People_WithSubSelect')
AND c.name = 'Id'
--------
DROP TABLE People
DROP TABLE People_ExactCopy
DROP TABLE People_WithJoin
DROP TABLE People_WithSubSelect
DROP TABLE ReverseNames
I noticed that the execution plans for both the WithJoin and WithSubSelect queries contained one join operator. I'm not sure if one will be significantly better on performance if we were dealing with a larger set of rows.
Can anyone shed any light on this and tell me if there is a way to utilize SELECT INTO with joins and still preserve the IDENTITY property?
From Microsoft:
When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true:
The SELECT statement contains a join, GROUP BY clause, or aggregate function. Multiple SELECT statements are joined by using UNION. The identity column is listed more than one time in the select list. The identity column is part of an expression. The identity column is from a remote data source.
If any one of these conditions is true, the column is created NOT NULL instead of inheriting the IDENTITY property. If an identity column is required in the new table but such a column is not available, or you want a seed or increment value that is different than the source identity column, define the column in the select list using the IDENTITY function.
You could use the IDENTITY
function as they suggest and omit the IDENTITY
column, but then you would lose the values, as the IDENTITY
function would generate new values and I don't think that those are easily determinable, even with ORDER BY
.
I don't believe there is much you can do, except build your CREATE TABLE statements manually, SET IDENTITY_INSERT ON, insert the existing values, then SET IDENTITY_INSERT OFF. Yes you lose the benefits of SELECT INTO, but unless your tables are huge and you are doing this a lot, [shrug]. This is not fun of course, and it's not as pretty or simple as SELECT INTO, but you can do it somewhat programmatically, assuming two tables, one having a simple identity (1,1), and a simple INNER JOIN:
SET NOCOUNT ON;
DECLARE
@NewTable SYSNAME = N'dbo.People_ExactCopy',
@JoinCondition NVARCHAR(255) = N' ON p.Name = r.Name';
DECLARE
@cols TABLE(t SYSNAME, c SYSNAME, p CHAR(1));
INSERT @cols SELECT N'dbo.People', N'Id', 'p'
UNION ALL SELECT N'dbo.ReverseNames', N'Name', 'r';
DECLARE @sql NVARCHAR(MAX) = N'CREATE TABLE ' + @NewTable + '
(
';
SELECT @sql += c.name + ' ' + t.name
+ CASE WHEN t.name LIKE '%char' THEN
'(' + CASE WHEN c.max_length = -1
THEN 'MAX' ELSE RTRIM(c.max_length/
(CASE WHEN t.name LIKE 'n%' THEN 2 ELSE 1 END)) END
+ ')' ELSE '' END
+ CASE c.is_identity
WHEN 1 THEN ' IDENTITY(1,1)'
ELSE ' ' END + ',
'
FROM sys.columns AS c
INNER JOIN @cols AS cols
ON c.object_id = OBJECT_ID(cols.t)
INNER JOIN sys.types AS t
ON c.system_type_id = t.system_type_id
AND c.name = cols.c;
SET @sql = LEFT(@sql, LEN(@sql)-1) + '
);
SET IDENTITY_INSERT ' + @NewTable + ' ON;
INSERT ' + @NewTable + '(';
SELECT @sql += c + ',' FROM @cols;
SET @sql = LEFT(@sql, LEN(@sql)-1) + ')
SELECT ';
SELECT @sql += p + '.' + c + ',' FROM @cols;
SET @sql = LEFT(@sql, LEN(@sql)-1) + '
FROM ';
SELECT @sql += t + ' AS ' + p + '
INNER JOIN ' FROM (SELECT DISTINCT
t,p FROM @cols) AS x;
SET @sql = LEFT(@sql, LEN(@sql)-10)
+ @JoinCondition + ';
SET IDENTITY_INSERT ' + @NewTable + ' OFF;';
PRINT @sql;
With the tables given above, this produces the following, which you could pass to EXEC sp_executeSQL instead of PRINT:
CREATE TABLE dbo.People_ExactCopy
(
Id int IDENTITY(1,1),
Name varchar(10)
);
SET IDENTITY_INSERT dbo.People_ExactCopy ON;
INSERT dbo.People_ExactCopy(Id,Name)
SELECT p.Id,r.Name
FROM dbo.People AS p
INNER JOIN dbo.ReverseNames AS r
ON p.Name = r.Name;
SET IDENTITY_INSERT dbo.People_ExactCopy OFF;
I did not deal with other complexities such as DECIMAL columns or other columns that have parameters such as max_length, nor did I deal with nullability, but these things wouldn't be hard to add it if you need greater flexibility.
In the next version of SQL Server (code-named "Denali") you should be able to construct a CREATE TABLE statement much easier using the new metadata discovery functions - which do much of the grunt work for you in terms of specifying precision/scale/length, dealing with MAX, etc. You still have to manually create indexes and constraints; but you don't get those with SELECT INTO either.
What we really need is DDL that allows you to say something like "CREATE TABLE a IDENTICAL TO b;" or "CREATE TABLE a BASED ON b;"... it's been asked for here, but has been rejected (this is about copying a table to another schema, but the same concept could apply to a new table in the same schema with a different table name). http://connect.microsoft.com/SQLServer/feedback/details/632689
I realize this is a really late response but whoever is still looking for this solution, like I was until I found this solution:
You can't use the JOIN operator for the IDENTITY column property to be inherited. What you can do is use a WHERE clause like this:
SELECT a.* INTO NewTable FROM MyTable a WHERE EXISTS (SELECT 1 FROM SecondTable b WHERE b.ID = a.ID)
This works.
精彩评论