开发者

Character level profiling in SSIS or SQL Server

I need to profile reference fields in a database to understand the patterns they are composed of. This needs to be done at a character level as there will be no spaces or punctuation in the reference fields.

As an example I'm looking for a solution that will take input like:

ABA1235DV6778 ABA1235DV6788 ABA2335DV6778

And suggest patterns like:

ABA\d\d35DV67\d\d

This will be used to later validate those reference fields once I can understand the permissable values in those columns.

I have looked at the profili开发者_如何学编程ng functionality in SSIS but it seems to lack granularity. Does anybody know how I can tune the profiling in SSIS 2008 or have an efficient function for SQL Server 2008 that can be used to achieve this?

Any help would be greatly appreciated,

Niall


It's not really clear from your post exactly what logic you want to apply to the strings. I'm guessing you want to use some form of edit distance calculation to identify similar strings, then generate a regular expression that matches them all. Those are typically tasks that would be implemented in an external program written in an appropriate language, not in SSIS or SQL Server. It is certainly not something you can do with pre-existing SSIS functionality.

So I would forget SSIS for now and work out the best way to implement your algorithm in .NET (or whatever other language you're comfortable with). Once you've done that you can decide whether to:

  • Write a self-contained executable and call it from an Execute Process task
  • Write a .NET DLL and use it in a Script Task, Script Component or CLR stored procedure
  • Write your own custom SSIS component
  • Write a complete program instead of using SSIS
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜