SQL hidden techniques? [duplicate]
Possible Duplicate:
Hidden Features of SQL Server
What are those pro/subtle techniques that SQL provides and not many know about which also cut code and improve performance?
eg: I have just learned how to use CASE statements inside aggregate functions and it totally changed my approach on 开发者_运维技巧things.
Are there others?
UPDATE: Basically any vendor. But PostgreSQL if you want to focus only on one :D
OVER Clause (SQL Server) a.k.a. Window functions (PostgreSQL) or analytic functions (Oracle)
This has been very nice to know for me. You can do all sorts of handy things like counting, partitioning, ranking, etc.
SQL GROUP BY - CUBE, ROLLUP clauses
Analytic (AKA ranking, AKA windowing) functions IE:
- ROW_NUMBER
- RANK
- DENSE_RANK
- NTILE
- OVER
Views: Normal and Materialized
It's difficult to say much without referencing vendor specific syntax
EXISTS. I'm amazed how many people still use COUNT(*) when checking existence or IN (SELECT...) clauses when EXISTS can do the job much quicker.
Most frequently you might see :
SELECT @MyVar = Count(*) FROM Table1 WHERE....
If @MyVar <> 0
BEGIN
--do something
END
when
IF EXISTS(SELECT 1 FROM Table1 WHERE...)
BEGIN
--don something
END
is always better.
SELECT... EXCEPT SELECT...
and
SELECT... INTERSECT SELECT...
can be useful (and disturbingly efficient) at pickout out differing or common rows--and that's for all columns in the row--between sets. This is extremely useful when you have lots of columns.
People don't use built in functions enough and like to reinvent the wheel, here are Ten SQL Server Functions That You Have Ignored Until Now
Using NEWSEQUENTIALID() instead of NEWID() on a clustered uniqueidentifier column will perform much better since it won't cause page splits and thus fragmentation
Using an auxilarry table of numbers so that you can quickly do some set based logic
for example
select DATEADD(m,number,'20010101')
from master..spt_values
where type = 'P'
order by 1
ANY, ALL and SOME
Two from Postgresql: DISTINCT ON (see example) and the new WITH.
Lately I have been using CROSS APPLY a lot.
Pivot
It's new in 2005 (which i know was a long time ago, but there's loads of people still using 2000). Saves doing a bunch of "case when name = 'tim' then value else 0 end" to build your aggregates this weekend.
Common Table Expressions (SQL Server 2005+)
WITH x AS (
SELECT 1 as A, 2 as B, 3 as C
),
WITH y AS (
SELECT 4 as A, 5 as B, 6 as C
UNION
SELECT 7 as A, 8 as B, 9 as C
)
SELECT A, B, C FROM x
UNION
SELECT A, B, C FROM y
They are really nice for breaking your queries into steps
under MySQL, using the keyword "STRAIGHT_JOIN". If you know your data, and the relationships of lookup tables that you are joining to, sometimes the optimizer looks at the smaller tables as a basis of a join and tries to query the "less record" count to your "bigger" table thus taking significantly more time. If your primary table is first in the "from", and its "criteria" up front, the straight join will hit that first, join to the rest of the tables and be done in no time.
I've had to do this dealing with gov't data of 10+ million records joined to about 15+ lookup tables. Without straight-join, the system choked after 20+ hours. Adding Straight-join, it was done in about 2 hrs.
In Sql Server, the HAVING clause. Particularly, HAVING(COUNT DISTINCT FOO)> @SomeNumber to quickly find rows with more than one distinct value for a given grouping.
From MSDN:
USE AdventureWorks2008R2 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
From PostgreSQL Docs:
Table partitioning
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
Query performance can be improved dramatically for certain kinds of queries.
Update performance can be improved too, since each piece of the table has indexes smaller than an index on the entire data set would be. When an index no longer fits easily in memory, both read and write operations on the index take progressively more disk accesses.
Bulk deletes may be accomplished by simply removing one of the partitions, if that requirement is planned into the partitioning design. DROP TABLE is far faster than a bulk DELETE, to say nothing of the ensuing VACUUM overhead.
Seldom-used data can be migrated to cheaper and slower storage media.
Derived tables to create "variables" and reduce repeated code.
Something like this but can be expanded upon. Obviously "Average Value" can be a much more complex calculation, and if you have several it helps clean up code.
Select *, case when AverageValue > 50 then 'Pass' Else 'Fail' end
From
(
Select ColA, ColB, AverageValue = (ColA+ColB)/2
From InnerMostTable
) AverageValues
Order By AverageValue Desc
In SQL Server using the Convert() function to get dates in the format mm/dd/yyyy instead of Cast() function
SELECT convert(datetime, '1/1/2010', 101)
I use this all the time
精彩评论