Executing a query on csv data stored in an ntext column
Say that the raw text of CSV exports and an associated timestamps are stored in a database, where one record is equivalent to one export.
Does anyone have a way to execute a query on the CSV file stored in that field without creating a second connection开发者_Python百科 to the database or exporting the data to a file and then reopening it using the csv text driver?
Assume that:
1) you can't write out a physical file onto the server in the solution
2) you can't a second connection to the server w/ OPENROWSET (servers, usernames & passwords change)
3) that it must be a 100% SQL solution - must be able to be run as an SP
4) that you only need to work with one record at time - the solution doesn't need to account for selecting from multiple csv files stored in the DB.
My solution would be to create a UDF that will parse the CSV data into a table variable. Then, in the SP, retrieve the CSV, pass it to the UDF, then run the query against the table variable.
First, create a UDF to return a table from the CSV value (uses CHAR(13) to determine new lines, may need to be altered to work with your data):
CREATE FUNCTION [dbo].[fnParseCSV] (@InputString NVARCHAR(MAX), @Delimiter NCHAR(1) = ',')
RETURNS @tbl TABLE (ID int, Val NVARCHAR(64)) AS
BEGIN
declare @singleLine nvarchar(max)
declare @id int
declare @val varchar(64)
WHILE LEN(@InputString) > 0 BEGIN
IF CHARINDEX(char(13), @InputString) > 0 BEGIN
SELECT @singleLine = SUBSTRING(@InputString, 1, CHARINDEX(char(13), @InputString) - 1)
IF CHARINDEX(@Delimiter, @singleline) > 0 BEGIN
SELECT @id = convert(int, SUBSTRING(@singleline, 1, CHARINDEX(@Delimiter, @singleline) - 1))
SELECT @val = RIGHT(@singleline, LEN(@singleline) - CHARINDEX(@Delimiter, @singleline) )
INSERT INTO @tbl (id, val) values (@id, @val)
END
SELECT @InputString = RIGHT(@InputString, LEN(@InputString) - CHARINDEX(char(13), @InputString) )
END
ELSE
BEGIN
IF CHARINDEX(@Delimiter, @inputString) > 0
BEGIN
SELECT @id = convert(int, SUBSTRING(@inputString, 1, CHARINDEX(@Delimiter, @inputString) - 1))
SELECT @val = RIGHT(@inputString, LEN(@inputString) - CHARINDEX(@Delimiter, @inputString) )
INSERT INTO @tbl (id, val) values (@id, @val)
END
set @inputString = ''
END
END
RETURN
END
Then run the query against that output:
select * from dbo.fnParseCsv('123,val1' + char(13) + '456,val2' + CHAR(13) + '789,val3', ',')
You could set up a series of user-defined functions which could parse through the column. It would likely be slow and wouldn't be robust at all.
As an example though (with no real error checking, etc. and only minimally tested):
IF OBJECT_ID('dbo.Test_CSV_Search') IS NOT NULL
DROP TABLE dbo.Test_CSV_Search
GO
CREATE TABLE dbo.Test_CSV_Search
(
my_id INT IDENTITY NOT NULL,
txt VARCHAR(MAX) NOT NULL,
CONSTRAINT PK_Test_CSV_Search PRIMARY KEY CLUSTERED (my_id)
)
GO
INSERT INTO dbo.Test_CSV_Search (txt) VALUES ('11, 12, 13, 14,15,16
21,22, 23,24, 25,26
31,22,33,34,35,36')
GO
IF OBJECT_ID('dbo.Get_CSV_Row') IS NOT NULL
DROP FUNCTION dbo.Get_CSV_Row
GO
CREATE FUNCTION dbo.Get_CSV_Row
(@my_id INT, @col_num SMALLINT, @search_value VARCHAR(100))
RETURNS @results TABLE (row_num INT, row_txt VARCHAR(MAX))
AS
BEGIN
DECLARE
@csv_txt VARCHAR(MAX),
@full_row VARCHAR(MAX),
@start_pos INT,
@end_pos INT,
@col_txt VARCHAR(100),
@cur_col SMALLINT,
@line_start INT,
@line_end INT,
@row_num INT
SELECT @csv_txt = txt + CHAR(10) FROM dbo.Test_CSV_Search WHERE my_id = @my_id
SELECT
@line_start = 1,
@cur_col = 1,
@start_pos = 1,
@row_num = 1
WHILE (CHARINDEX(CHAR(10), @csv_txt, @line_start) > 0)
BEGIN
SELECT
@line_end = CHARINDEX(CHAR(10), @csv_txt, @line_start),
@end_pos = CHARINDEX(',', @csv_txt, @start_pos)
WHILE (@cur_col < @col_num)
BEGIN
SET @start_pos = @end_pos + 1
SET @end_pos = CHARINDEX(',', @csv_txt, @start_pos)
SET @cur_col = @cur_col + 1
END
IF (RTRIM(LTRIM(SUBSTRING(@csv_txt, @start_pos, @end_pos - @start_pos))) = @search_value)
BEGIN
INSERT INTO @results (row_num, row_txt) VALUES (@row_num, RTRIM(LTRIM(SUBSTRING(@csv_txt, @line_start, @line_end - @line_start))))
END
SELECT
@line_start = @line_end + 1,
@start_pos = @line_end + 1,
@cur_col = 1,
@row_num = @row_num + 1
END
RETURN
END
GO
SELECT * FROM dbo.Get_CSV_Row(1, 1, '11')
精彩评论