开发者

SQL separate string being passed

I have a product table with a tag column, each product has multiple tags stored in this format: "|technology|mobile|acer|laptop|" ...second product's tags could look like this "|computer|laptop|toshiba|"

I am using MS SQL Server 2008 and stored procedure, I would like to know how I could pass a string like "|computer|laptop|" and get both records returned as they both have the tag laptop in them and if I passed "|computer|" only the second record would return as it is the only one comtainning t开发者_StackOverflowhat tag.

What is the best way of doing this without performance penalties using stored procedure?

I have so far had no luck with different codes i have found on the internet, I really hope you guys can maybe help me with this, thank you.


I agree with the other posters that storing data in a column like that is going to cause headaches. You really want to store those tags in a child table so you can easily and efficiently join them. If it's an inherited system or something you can't refactor right away you can write a split function.

The typical sql split implementation uses a while loop and a table variable in a multi-statement TVF. Every iteration incurs more I/O and CPU overhead. Performance testing on SQL 2005 SP1 showed that this overhead is hidden from the I/O Stats and query plan. Profiling the code will reveal the true cost.

Rewriting that function into a inline TVF is much more efficient. The primary difference between an inline and multi-statement TVF is the Query Optimizer will merge the inline function into the query before processing; this eliminates the overhead from the function call. Also, since there is no table variable required, the additional I/O cost is eliminated. Finally, you avoid the costly iterative processing.

Here is the fastest, most scalable split function I could come up with including unit tests and summary.

This function requires a numbers table:

CREATE TABLE dbo.Numbers 
(
    NUM INT PRIMARY KEY CLUSTERED
)

;WITH Nbrs ( n ) AS 
(
    SELECT 1 UNION ALL
    SELECT 1 + n FROM Nbrs WHERE n < 10000 
)
INSERT INTO dbo.Numbers
SELECT n FROM Nbrs
OPTION ( MAXRECURSION 10000 )

The source of the function is here:

IF EXISTS (
    SELECT 1
    FROM dbo.sysobjects
    WHERE id = object_id(N'[dbo].[ParseString]')
        AND xtype in (N'FN', N'IF', N'TF'))
BEGIN
    DROP FUNCTION [dbo].[ParseString]
END
GO

CREATE FUNCTION dbo.ParseString (@String VARCHAR(8000), @Delimiter VARCHAR(10))
RETURNS TABLE
AS
/*******************************************************************************************************
*    dbo.ParseString
*
*    Creator:        MagicMike
*    Date:           9/12/2006
*
*
*    Outline:        A set-based string tokenizer
*                    Takes a string that is delimited by another string (of one or more characters),
*                    parses it out into tokens and returns the tokens in table format.  Leading
*                    and trailing spaces in each token are removed, and empty tokens are thrown
*                    away.
*
*
*    Usage examples/test cases:
                Single-byte delimiter:
                     select * from dbo.ParseString2('|HDI|TR|YUM|||', '|')
                     select * from dbo.ParseString('HDI| || TR    |YUM', '|')
                     select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', '|')
                     select * from dbo.ParseString2('HDI|||TR|YUM', '|')
                     select * from dbo.ParseString('', '|')
                     select * from dbo.ParseString('YUM', '|')
                     select * from dbo.ParseString('||||', '|')
                     select * from dbo.ParseString('HDI TR YUM', ' ')
                     select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', ' ') order by Ident
                     select * from dbo.ParseString(' HDI| || S P A C E S |YUM | ', ' ') order by StringValue

                Multi-byte delimiter:
                     select * from dbo.ParseString('HDI and TR', 'and')
                     select * from dbo.ParseString('Pebbles and Bamm Bamm', 'and')
                     select * from dbo.ParseString('Pebbles and sandbars', 'and')
                     select * from dbo.ParseString('Pebbles and sandbars', ' and ')
                     select * from dbo.ParseString('Pebbles and sand', 'and')
                     select * from dbo.ParseString('Pebbles and sand', ' and ')
*
*
*    Notes:
                     1. A delimiter is optional.  If a blank delimiter is given, each byte is returned in it's own row (including spaces).
                        select * from dbo.ParseString3('|HDI|TR|YUM|||', '')
                     2. In order to maintain compatibility with SQL 2000, ident is not sequential but can still be used in an order clause
                     If you are running on SQL2005 or later
                        SELECT Ident, StringValue FROM
                     with
                        SELECT Ident = ROW_NUMBER() OVER (ORDER BY ident), StringValue FROM
*
*
*    Modifications
*
*
********************************************************************************************************/
RETURN (
SELECT Ident, StringValue FROM
    (
        SELECT Num as Ident,
            CASE
                WHEN DATALENGTH(@delimiter) = 0 or @delimiter IS NULL
                    THEN LTRIM(SUBSTRING(@string, num, 1)) --replace this line with '' if you prefer it to return nothing when no delimiter is supplied. Remove LTRIM if you want to return spaces when no delimiter is supplied
            ELSE
                LTRIM(RTRIM(SUBSTRING(@String,
                    CASE
                        WHEN (Num = 1 AND SUBSTRING(@String,num ,DATALENGTH(@delimiter)) <> @delimiter) THEN 1
                        ELSE Num + DATALENGTH(@delimiter)
                    END,
                    CASE CHARINDEX(@Delimiter, @String, Num + DATALENGTH(@delimiter))
                        WHEN 0 THEN LEN(@String) - Num + DATALENGTH(@delimiter)
                        ELSE CHARINDEX(@Delimiter, @String, Num + DATALENGTH(@delimiter)) - Num -
                            CASE
                                WHEN Num > 1 OR (Num = 1 AND SUBSTRING(@String,num ,DATALENGTH(@delimiter)) = @delimiter)
                                       THEN DATALENGTH(@delimiter)
                                ELSE 0
                            END
                       END
                    )))
              End  AS StringValue
        FROM dbo.Numbers
        WHERE Num <= LEN(@String)
            AND (
                    SUBSTRING(@String, Num, DATALENGTH(ISNULL(@delimiter,''))) = @Delimiter
                    OR Num = 1
                    OR DATALENGTH(ISNULL(@delimiter,'')) = 0
                )
    ) R WHERE StringValue <> ''
)

For your case, you could use it like this:

--SAMPLE DATA 
CREATE TABLE #products 
(
    productid INT IDENTITY PRIMARY KEY CLUSTERED ,
    prodname VARCHAR(200),
    tags VARCHAR(200)
)

INSERT INTO #products (prodname, tags)
SELECT 'toshiba laptop', '|laptop|toshiba|notebook|'
UNION ALL 
SELECT 'toshiba netbook', '|netbook|toshiba|'
UNION ALL 
SELECT 'Apple macbook', '|laptop|apple|notebook|'
UNION ALL 
SELECT 'Apple mouse', '|apple|mouse'


--Actual solution 

DECLARE @searchTags VARCHAR(200)
SET @searchTags = '|apple|laptop|' --This would the string that would get passed in if it were a stored procedure 

--First we convert the supplied tags into a table for use later
--My (2005) dev box raised a severe error attempting to do the search in 1 step 
--hence the temp table
CREATE TABLE #tags 
(
    tag VARCHAR(200) PRIMARY KEY CLUSTERED
)

INSERT INTO #tags --The function splits the string up into one record for each value
SELECT stringValue 
FROM dbo.parsestring(@searchTags,'|') --SQL 2005 has a real problem joining to a TVF twice, apparently


SELECT DISTINCT p.* 
FROM #products P --we join the products table with the function to get a row for each tag so we can compare with the temp table
    CROSS APPLY (SELECT stringValue FROM dbo.parsestring(P.tags,'|')) T 
WHERE EXISTS(SELECT * FROM #tags WHERE tag = T.stringValue) --we compare the rows with our temp table and if we get matches, the products are returned
/*This will return the Apple Macbook and the Toshiba Laptop because they both contain
 the 'laptop' tag and the Apple mouse because it contains the 'apple' tag. The 
toshiba netbook contains neither tag so it won't be returned.*/

But, with your tags in a separate table as suggested (1-many for a simplified example) It would look like this:

SELECT * FROM Products P
WHERE EXISTS (SELECT * 
                  FROM tags T 
                      INNER JOIN dbo.parsestring(@tags,'|') Q
                          ON T.tag = Q.StringValue
                  WHERE T.productid = P.productiId )  


You have a many-to-many relationship between products and tags. The best way of doing this is to redesign your database. Create a table of tags and a junction table that links products with tags.


That's not a very good design. Combining like terms into one field and separating them with a delimiter such as a vertical bar does not scale well and it is very limiting.

I recommend you read up on how to design databases. The best book I ever bought regarding database design was Database Design for Mere Mortals by Michael Hernandez ISBN: 0-201-69471-9. Amazon Listing I noticed he has a second edition.

He walks you through the entire process of (from start to finish) of designing a database. I recommend you start with this book.

You have to learn to look at things in groups or chunks. Database design has simple building blocks just like programming does. If you gain a thorough understanding of these simple building blocks you can tackle any database design.

In programming you have:

  • If Constructs
  • If Else Constructs
  • Do While Loops
  • Do Until Loops
  • Case Constructs

With databases you have:

  • Data Tables
  • Lookup Tables
  • One to One relationships
  • One to Many Relationships
  • Many to Many relationships
  • Primary keys
  • Foreign keys

The simpler you make things the better. A database is nothing more than a place where you put data into cubbie holes. Start by identifying what these cubbie holes are and what kind of stuff you want in them.

You are never going to create the perfect database design the first time you try. This is a fact. Your design will go through several refinements during the process. Sometimes things won't seem apparent until you start entering data, and then you have an ahh ha moment.

The web brings it's own sets of challenges. Bandwith issues. Statelessness. Erroneous data from processes that start but never get finished.


make a split with CLR function return a table with the value or pass as xml and load it into a table varible an make a join

create procedure  search 
(
@data xml
)
AS
BEGIN

  --declare @data xml
  declare @LoadData table
  (
    dataToFind varchar(max)
  )
  --set @data= cast(
  --'<data>
  --    <item>computer</item>
  --    <item>television</item>
  --</data>' as xml)

  insert into @LoadData
  SELECT T2.Loc.value('.','varchar(max)')
  FROM   (select @data as data )T
  CROSS APPLY data.nodes('/data/item') as T2(Loc) 

  select * from @LoadData--use for join 

END


I would suggest you write an extra couplle of tables that with "proper design, Populate those tables from the existing not well designed bit - this way y our search will work properly buy others using the old | pipe approach won't notice till you have time to refactor

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜