开发者

Metrics from multiple threads

So this seems like a pretty common use case, and maybe I'm over thinking it, but I'm having开发者_JS百科 an issue with keeping centralized metrics from multiple threads. Say I have multiple worker threads all processing records and I every 1000 records I want to spit out some metric. Now I could have each thread log individual metrics, but then to get throughput numbers, but I'd have to add them up manually (and of course time boundaries won't be exact). Here's a simple examples:

public class Worker implements Runnable {

   private static int count = 0;
   private static long processingTime = 0;

   public void run() {
       while (true) {
          ...get record
          count++;
          long start = System.currentTimeMillis();
          ...do work
          long end = System.currentTimeMillis();
          processingTime += (end-start);
          if (count % 1000 == 0) {
              ... log some metrics
              processingTime = 0;
              count = 0;
          }
       }
    }
}

Hope that makes some sense. Also I know the two static variables will probably be AtomicInteger and AtomicLong . . . but maybe not. Interested in what kinds of ideas people have. I had thought about using Atomic variables and using a ReeantrantReadWriteLock - but I really don't want the metrics to stop the processing flow (i.e. the metrics should have very very minimal impact on the processing). Thanks.


Offloading the actual processing to another thread can be a good idea. The idea is to encapsulate your data and hand it off to a processing thread quickly so you minimize impact on the threads that are doing meaningful work.

There is a small handoff contention, but that cost is usually a lot smaller than any other type of synchronization that it should be a good candidate in many situations. I think M. Jessup's solution is pretty close to mine, but hopefully the following code illustrates the point clearly.

public class Worker implements Runnable {

   private static final Metrics metrics = new Metrics();

   public void run() {
      while (true) {
        ...get record
        long start = System.currentTimeMillis();
        ...do work
        long end = System.currentTimeMillis();
        // process the metric asynchronously
        metrics.addMetric(end - start);
     }
  }

  private static final class Metrics {
     // a single "background" thread that actually handles
     // processing
     private final ExecutorService metricThread = 
           Executors.newSingleThreadExecutor();
     // data (no synchronization needed)
     private int count = 0;
     private long processingTime = 0;

     public void addMetric(final long time) {
        metricThread.execute(new Runnable() {
           public void run() {
              count++;
              processingTime += time;
              if (count % 1000 == 0) {
                 ... log some metrics
                 processingTime = 0;
                 count = 0;
              }
           }
        });
      }
   }
}


I would suggest if you don't want the logging to interfere with the processing, you should have a separate log worker thread and have your processing threads simply provide some type of value object that can be handed off. In the example I choose a LinkedBlockingQueue since it has the ability to block for an insignificant amount of time using offer() and you can defer the blocking to another thread that pulls the values from a queue. You might need to have increased logic in the MetricProcessor to order data, etc depending on your requirements, but even if it is a long running operation it wont keep the VM thread scheduler from restarting the real processing threads in the mean time.

public class Worker implements Runnable {

  public void run() {
    while (true) {
      ... do some stuff
      if (count % 1000 == 0) {
        ... log some metrics
        if(MetricProcessor.getInstance().addMetrics(
            new Metrics(processingTime, count, ...)) {
          processingTime = 0;
          count = 0;
        } else {
          //the call would have blocked for a more significant
          //amount of time, here the results
          //could be abandoned or just held and attempted again
          //as a larger data set later
        }
      }
    }
  }
}

public class WorkerMetrics {
  ...some interesting data
  public WorkerMetrics(... data){
    ...
  }
  ...getter setters etc
}

public class MetricProcessor implements Runnable {
  LinkedBlockingQueue metrics = new LinkedBlockingQueue();
  public boolean addMetrics(WorkerMetrics m) {
    return metrics.offer(m); //This may block, but not for a significant amount of time.
  }

  public void run() {
    while(true) {
      WorkMetrics m = metrics.take(); //wait here for something to come in
      //the above call does all the significant blocking without
      //interrupting the real processing
      ...do some actual logging, aggregation, etc of the metrics
    }
  }
}


If you depend on the state of count and the state of processingTime to be in synch then you would have to be using a Lock. For example if when ++count % 1000 == 0 is true, you want to evaluate the metrics of processingTime at THAT time.

For that case, it would make sense to use a ReentrantLock. I wouldn't use a RRWL because there isn't really an instance where a pure read is occuring. It is always a read/write set. But you would need to Lock around all of

  count++
  processingTime += (end-start);
  if (count % 1000 == 0) {
      ... log some metrics
      processingTime = 0;
      count = 0;
  }

Whether or not count++ is going to be at that location, you will need to lock around that also. Finally if you are using a Lock, you do not need an AtomicLong and AtomicInteger. It just adds to the overhead and isn't more thread-safe.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜