开发者

Importing data through API for thousands of users by using threads

In our application we need to import transaction data from paypal through an API for the users of my application and store in the database. I've thousands (approx 5k now) of users and it is increasing day by day.

This application is a .net windows service.

This imports data on hourly basis for all the users. At present we are importing data for users one user after the other, but sometimes what happens one user data may be so large that it takes around 5hrs to get his entire data, so we are blocking other users till this user data import is finished. This hourly import for all the other users completely gone for a toss.

To avoid this we thought of creating threads for each user import and run them every hour using windows service. Here we have a situation where we need to think about bandwidth at any point of time as all the threads will start at the same time. Is t开发者_StackOverflow社区his an issue at all?

Now, I want to know whether our new implementation is right way or not? Also I want to know how it is done usually? If anyone has come across this kind of functionality then please let us know how it is done.

If my question is not clear enough please let me know I'll provide more info.

Edit: If I send so many requests to Paypal from a single IP, how does it handle it? Any idea whether it is limiting requests per IP?

Update: Thanks for all the suggestions and feedback.

I thought of using jgauffin's solution as it was perfect mimic of ThreadPool. But here I need some more features like changing thread limit dynamically and recursive calling of call back method.

After lot of research and analysing thread pool, I've decided to use SmartThreadPool which is made based on threadpool logic but with more features. It is quite good and serving my purpose perfectly.


Do not use a thread per user. Put up a WORK ITEM in a thread pool for every user. This way you have the best of both worlds - not the memory overhead of 5000 threads, and more load control because you can determine how many threads the ThreadPool uses to work off the work items.


What I'd do, is start with a pool of threads (say 10), and let each thread do an import. When done, it will take the next item from the queue. You leverage the existing ThreadPool class and queue all your import requests to that threadpool. You can control the max number of threads for this ThreadPool.

Creating thousands of threads is a bad idea for several reasons, it used to be too much for the windows OS, and as you indicate yourself, you might flood the network (or perhaps the paypal service).

For extreme scalability, you can do asynchronous IO that does not block a thread while a request is in progress, but that API has a steep learning curve, and is probably not needed for your scenario.


I would use a queue and let's say five threads for this. Each time a thread is is completed it will get a new user from the queue.

Example code:

public class Example
{

    public static void Main(string[] argv)
    {
        //setup
        DownloadQueue personQueue = new DownloadQueue();
        personQueue.JobTriggered += OnHandlePerson;
        personQueue.ThreadLimit = 10; //can be changed at any time and will be adjusted when a job completed (or a new one is enqueued)

        // enqueue as many persons as you like
        personQueue.Enqueue(new Person());

        Console.ReadLine();
    }

    public static void OnHandlePerson(object source, PersonEventArgs e)
    {
        //download persno here.
    }
}

public class DownloadQueue
{
    Queue<Person> _queue = new Queue<Person>();
    int _runningThreads = 0;

    public int ThreadLimit { get; set; }

    /// <summary>
    /// Enqueue a new user.
    /// </summary>
    /// <param name="person"></param>
    public void Enqueue(Person person)
    {
        lock (_queue)
        {
            _queue.Enqueue(person);
            if (_runningThreads < ThreadLimit)
                ThreadPool.QueueUserWorkItem(DownloadUser);
        }
    }

    /// <summary>
    /// Running using a ThreadPool thread.
    /// </summary>
    /// <param name="state"></param>
    private void DownloadUser(object state)
    {
        lock (_queue)
            ++_runningThreads;

        while (true)
        {
            Person person;
            lock (_queue)
            {
                if (_queue.Count == 0)
                {
                    --_runningThreads;
                    return; // nothing more in the queue. Lets exit
                }
                person = _queue.Dequeue();
            }

            JobTriggered(this, new PersonEventArgs(person));
        }
    }

    public event EventHandler<PersonEventArgs> JobTriggered = delegate { };
}


public class PersonEventArgs : EventArgs
{
    Person _person;

    public PersonEventArgs(Person person)
    {
        _person = person;
    }

    public Person Person { get { return _person; } }
}
public class Person
{
    public Person(string fName, string lName)
    {
        this.firstName = fName;
        this.lastName = lName;
    }

    public string firstName;
    public string lastName;
}


Creating 5000 threads in code is not a good thing , it may slows down the server by very huge amount even it may crash it.

What you need is the load balancing out here.

try to think about the MSMQ based solution if you are on .net plateform and quequ user requests and then there must be some dispather which will distribute the user request between servers.


I would avoid creating a thread for each user. This approach is not very scalable. And I am assuming the API does not have a mechanism for doing the downloads asynchronously. If it does then that is probably the way to go.

The producer-consumer pattern might work well here. The idea is to create fixed size pool of threads that consume work items from a shared queue. It is probably best to avoid the ThreadPool in your case because it is designed for short-lived tasks mainly. You do not want your long-lived tasks to exhaust it because it is used for a lot of different things in the .NET BCL.

If you are using .NET 4.0 you can take advantage of the BlockingCollection. There is also a backport available as part of the Reactive Extensions download. Here is what your code might look like.

Note: You will have to harden the code to make it more robust, gracefully shutdown, etc. yourself.

public class Importer
{
  private BlockingCollection<Person> m_Queue = new BlockingCollection<Person>();

  public Importer(int poolSize)
  {
    for (int i = 0; i < poolSize; i++)
    {
      var thread = new Thread(Download);
      thread.IsBackground = true;
      thread.Start();
    }
  }

  public void Add(Person person)
  {
    m_Queue.Add(person);
  }

  private void Download()
  {
    while (true)
    {
      Person person = m_Queue.Take();
      // Add your code for downloading this person's data here.
    }
  }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜