开发者

SSIS: Way to handle hot folder items in parallel?

We have eight Xeon (i7) cores and 16 gig of RAM on our SSIS box. We have about 200 image files we want to convert using a command line utility every day. Currently the process is using Adobe Photoshop and droplets (very manual, taking upwards of two hou开发者_如何学运维rs a day)

Using SSIS hot folders, is there a way to execute up to eight conversions at once?

Is there any way to tell a process completed or execute code upon it's completion?


Based upon your description in the comment of what you are trying to do, could you have your loop move the files from your "hot folder" into one of eight (or as many as you want to run in parallel) processing folders. Then you could have eight loops that use these processing "hot folders" to actually do the processing. That way you could have the eight loops processing in parallel.

Here is how I would do this:

On your control surface:

ForEach Loop Container (FELC)

  • Configured to look at your "Hot Folder"

  • Inside that container a File System Task to move the files to
    processing folders.

  • A Variable set up from the FELC to get the filename to act on. (I used USER::HotFolderFilePath)

  • A second variable to contain the destination for the move. (I used USER::DestPath)

  • A Script task to change the Destination Folder name after each iteration

    Imports System
    Imports System.Data
    Imports System.Math
    Imports Microsoft.SqlServer.Dts.Runtime

    Public Class ScriptMain

    Public Sub Main()
    
        Select Case Dts.Variables("User::DestPath").Value.ToString
            Case "E:\Processing\ProcFolder7"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder8"
            Case "E:\Processing\ProcFolder6"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder7"
            Case "E:\Processing\ProcFolder5"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder6"
            Case "E:\Processing\ProcFolder4"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder5"
            Case "E:\Processing\ProcFolder3"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder4"
            Case "E:\Processing\ProcFolder2"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder3"
            Case "E:\Processing\ProcFolder1"
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder2"
            Case Else
                Dts.Variables("User::DestPath").Value = "E:\Processing\ProcFolder1"
        End Select
        Dts.TaskResult = Dts.Results.Success
    End Sub
    

    End Class

  • After the ForEach Loop Container is finished, you have 8 folders, each with some of the files in it.

  • At this point, create one new ForEach Loop Container for each of the processing folders and process it as you did in your original "Hot Folder" Since these are all linked on your control surface to the original FELC, you will get some degree of parllelization from this. You may need to play with the package property "max concurrent executables". This is usually set to -1 which lets Sql Server determine max based upon server resources.


If by Hot Folders you mean folders that SSIS will monitor, you can try using FileWatcher from sqlis.com . This utility will sit and wait for a file to be placed into a watched folder. It can use wildcards, just not very complex ones. I see you want to process upwards of 200 images, but I don't see how many 'folders' these images will be placed into. If they are being placed into a single folder, then you can use the technique I use to process .pdfs in parallel. I have a Script Task read in the full filename (directory included) into a flat file, so at the end I have something that looks like this:

  1. c:\rootfolder\images\image1.png
  2. c:\rootfolder\images\image2.png
  3. c:\rootfolder\images\image3.png

I then load this into a sql 'Task' table as a varchar(x) field. I have, in my case 8, parallel data flow tasks each extracting a single row from the table. I process invoice/statement pdfs like this doing 1,000s in a matter of minutes. The process I developed prevents bottlenecks by having all N threads pull from the same source. If you choose to have N tasks pull from N files, there is a change one or more of those data flow tasks will fall behind holding up your whole process. There is no place for me to post the code I use, but I'd be more than happy to send it to you along with a presentation I have on the parallel task structure. Just send me an email.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜