开发者

How to FTP constantly incoming files

Ok, here's the situation... I have an application that generates about 8 files per second. Each file is 19-24kb. This generates about 10 to 11 MB per minute. This question is not about how to ftp, because I have that solution already... The question is more about how to keep up with the flow of data (only a 2mb upload bandwidth in most cases, unless I am travelling to a client site that has a large pipe). I dont care if ftp takes longer to transfer then the rate of flow, but I want to know if anyone has an idea on how to batch the files to move them so that when the ftp process is finished it will delete just those files it transfered and then move on to the next batch. Here is what I was thinking:

Multi thread the app, first thread runs the app, second thread is a timer that creates a text file every 'N' minutes with all the files created in that time span. StreamRead the file and move the files that are in text to another location (maybe create a tem开发者_运维知识库p folder) and then ftp those files, then delete files, folder and textfile... in the mean time, more text files are being written and temp folders being created. Does this sound feasible? I will take any suggestions that anyone has under advisement, just looking for the fastest and most reliable path.

Please dont ask to see the code, there is no reason to see it considering we are working with hypotheticals.


I would create a service and add the incoming files into a concurrent collection using FileSystemWatcher, System.Threading.Timer or both (FileSystemWatcher may miss files if its buffer is overrun so it is a good idea to have a timer going to pick up any files that are missed). When files come in I would move them into a separate folder and would process them using .NET 4.0 tasks. I would then do any necessary post processing in continuation steps to the original tasks. You can have continuation steps that handle any faults and different continuation steps that occur upon success. Each of these tasks will spin up a thread in the thread pool and will be managed for you.

Here is an example from http://msdn.microsoft.com/en-us/library/dd997415.aspx of a OnlyOnFaulted continuation task. You could have a second continuation task that will only run when successful.

var task1 = Task.Factory.StartNew(() =>
{
    throw new MyCustomException("Task1 faulted.");
})
.ContinueWith((t) =>
    {
        Console.WriteLine("I have observed a {0}",
            t.Exception.InnerException.GetType().Name);
    },
    TaskContinuationOptions.OnlyOnFaulted);


Wihtout realy knowing any more details on why you need to keep all the work in a single application and deal with threading complexity, one could argue to keep the part that generates the files and the part that FTPs the files in separate applications.

Separation of Responsibility. Ensure each application does only one job and does it right and fast.

One Serivce or app(desktop/web which ever) generating the files.

Another Service which watches a folder and moves any incoming files into a temp filder, does what it needs to do, FTPs and deletes.

Seeing I don't know your setup and where you get the content from for your files, writing it in a single app might be the best choice exactly how you suggested.

Basically to anwser your question. Yes, it does sound feasable what you want to do. How you implement it and what you are happy with implementing is up to you.

If you get stuck somewhere during implementation, feel free to post any issues in a new threat with some code samples on how you have a specific feature implemented and what the issue is you are experiencing.

Until then, hypothetically, any approach you feel is able to manage what you need to achieve is perfectly valid.

EDIT

Seeing you stated you already got the application which generates the files done and you already have a solution which FTPs means using 2 separate applications sounds more plausible.

All you need then is wrap a service around the FTP solution and happy days. No need to interfeere with the original application which generates the files if it is already working.

Why risk breaking it, unless you must add the fTP feature into it and you have no choice.


I worked on something similar in my old job. I'd an external process dump files on a certain folder. This is the algorithm that I followed:

  1. Have a FileSystemWatcher running on the source directory where the files get dumped
  2. When new file is found, process ALL files from the directory in ascending order of date. (in your case ftp the file)
  3. Once a file is processed, I move them to a Processed directory (in you case, you can delete them)

Things to consider:

  1. How many open ftp connections / processing threads can I have
  2. FileSystemWatcher can and will raise event when processing another file. How to handle it / send it to an appropriate thread


You need to insert a queue between the producer of the files and the consumer (the FTP host) to be able to buffer files if the producer is too fast. This requires some form of multithreading or even multiple processes.

You propose a solution where the queue is the file system and that is quite possible but in many cases not ideal. You have to get locking right to avoid transferring half filled or empty files etc. If you decide to use the file system it is my experience that FileSystemWatcher can't be used for that purpose. Using a timer to run a task say every second to pick up new files is much more reliable.

Other queue technologies could be an in-memory queue (but then you have to think about how to handle crashes), a private Microsoft Message Queue or a SQL Server Broker queue. The best solution very much depends on your requirements.

FTP is not really transactional and you may decide to use a queue that is not transactional (both MSMQ and SQL Server Broker are transactional), but you should still try to build your applications around the concept of a transaction where the file is created, queued and delivered. If it cannot be delivered it is left in the queue and delivery is retried later. If it cannot be queued the producer should retry to queue it etc. You don't want a situation where a file is never delivered or is delivered twice.

It is not clear from your question how you are going to use FTP, but I would advise you to use an open source or commercial library to directly be able to use FTP from your application instead of shelling out to ftp.exe. This will allow your application to behave intelligently about keeping the FTP connection open to avoid excessive reconnects etc.

You should also consider how to handle the situation where the queue grews too large. One option could be to stop the producer until the queue size has been reduced below a threshold.


  1. Start a timer that fires off once a second.
  2. In the timer's elapsed event handler, stop the timer.
  3. Get a list of all files in the incoming directory.
  4. Try to open each file exclusively. This prevents you from reading a file that is still being written to.
  5. Copy each file to a staging directory and delete it from the incoming directory.
  6. Once you've moved all of the files in your list, send the files in the staging directory via FTP.
  7. Once you've FTP'd the files, delete them from the staging directory.
  8. Start the timer.

The timer's elapsed handler is run for you on the thread pool, and you should need any fancier thread management. Since you're primary constraint is your FTP bandwidth, there's little advantage to doing anything else with other threads until the files are uploaded.

This approach gives you protection in case of a system crash. Files that are in the staging directory that aren't sent are picked up during the next cycle. Same goes for files in the incoming directory.

If your FTP receiving side can handle zipped files, you'll improve your throughput by zipping the contents of the staging directory and sending it as one file.


I would set up a chain of threads using BlockingCollections.

One producer thread read files available, using a timer or FileSystemWatcher etc, and stores them in a BlockingCollection. It also stores the files in a list to ensure they are only added once.

var availableFiles = new BlockingCollection<string>();
var processedFiles = new BlockingCollection<string>();
var newFiles = new HashSet<string>();

...
lock (newFiles) {
    foreach (var file in Directory.GetFiles())
        if (!newFiles.Contains(file)) {
            availableFiles.Add(file);
            newFiles.Add(file);
        }
}

One, or more, ftp threads sends the files and then puts them into the processed collection

foreach (var file in availableFiles.GetConsumingEnumerable()) {
   SendFileOverFtp(file);
   processedFiles.Add(file);
}

One thread that cleans up the processed files

foreach (var file in processedFiles.GetConsumingEnumerable()) {
    lock (newFiles) {
       File.Delete(file);
       newFiles.Remove(file);
    }
}

Another alternative is to have the producing thread also read the files into memory and delete them. In that case you can skip the last stage and the newFiles collection


As an FTP server owner in this situation, I'd also ask that you find a way to stay signed on as much as possible.

Sign on/offs are often more "expensive" (in terms of computation, config blocking, etc.) than individual file transfers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜