TSQL function to parse text based on keywords?
I need a hand in building a text relevance function using a list of keywords that enters a SQL Server 2008 sproc, and am performing a free-text search which returns a list of table rows.
For each row, I want a function, say "ParseForKeywords(result, listOfKeywords) AS Parsed Result", to build a new string, based on the result field:
listOfkeywords will be a comma or space-delimited list of words.
If the result is larger than say 100 words do the following: Find the first occurrence of any of the keywords, subtract 5 or 6 words, and begin a new string from there for the length of the result string.
If the result is larger than 200 words, do the same as above, for the next 50 words, then find the next occurrence of any of the keywords, minus 5 or 6 words and append with "...".
What I'm looking for is a starting point, and a bit of advice on where this logic would be best-placed: on the SQL Server, or let the .Net code do this when populating a DataTable cell?
If doing this in a TSQL function: I would begin by creating a cursor or CTE to loop through the comma-delimited list of words. On each pass. To find the first occurrence of any of the words, I'd have to loop through the number of keywords to find the lowest CHARINDEX() value.
Is there a way to do a WHERE IN (开发者_JS百科'word1', 'word2', 'word3') ??
Once this is found, I would subtract x # of characters from that charindex value, until I count say, 4 spaces. I would also need to see if any of those words occur later on in the text, at which point the whole process would repeat.
Looking at this now, it would require at least two functions.
Thanks.
Option 1: Put this logic in the code to invoke after you run your query. Add a new column to the results that contains the ParseForKeywords values.
This is a simpler implementation but will perform bad if you are paging results because ParseForKeywords will be run for every results.
Option 2: Create a CLR function and run ParseForKeywords in the query.
This may make your architecture a little more complex but this will perform much better when paging your results.
Best suited for such a task would be a writing a CLR StoredProcedure. There are a lot of examples and guides over the internet.
SQL is actually a very bad place to parse text for keywords unless you are using the Full Text Indexing. Here is a good overview:
http://www.simple-talk.com/sql/learn-sql-server/understanding-full-text-indexing-in-sql-server/
Without building a full-text index, your queries have to parse every string you search over linearly, making this a potentially very slow operation, especially if you have a lot of rows to search.
Another option is to use a package like lucene and do your searching full text searching outside of the database.
精彩评论