Windows Azure - Cleaning Up The WADLogsTable
I've read conflicting information as to whether or not the WADLogsTable table used by the DiagnosticMonitor in Windows Azure will automatically prune old log entries.
I'm guessing it doesn't,开发者_运维问答 and will instead grow forever - costing me money. :)
If that's the case, does anybody have a good code sample as to how to clear out old log entries from this table manually? Perhaps based on timestamp? I'd run this code from a worker role periodically.
The data in tables created by Windows Azure Diagnostics isn't deleted automatically.
However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.
PS D:\> help Clear-WindowsAzureLog
NAME Clear-WindowsAzureLog
SYNOPSIS Removes Windows Azure trace log data from a storage account.
SYNTAX Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD evelopmentStorage] [-StorageAccountCredentials ] []
Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>] [-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc ountAndKey>] [<CommonParameters>]
You need to specify -ToUtc parameter, and all logs before that date will be deleted.
If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.
Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.
Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).
public static void TruncateDiagnostics(CloudStorageAccount storageAccount,
DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");
var query = new TableQuery();
var dt = startDateTime;
while (true)
{
dt = stepFunction(dt);
if (dt>finishDateTime)
break;
var l = dt.Ticks;
string partitionKey = "0" + l;
query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
query.Select(new string[] {});
var items = cloudTable.ExecuteQuery(query).ToList();
const int chunkSize = 200;
var chunkedList = new List<List<DynamicTableEntity>>();
int index = 0;
while (index < items.Count)
{
var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
chunkedList.Add(items.GetRange(index, count));
index += chunkSize;
}
foreach (var chunk in chunkedList)
{
var batches = new Dictionary<string, TableBatchOperation>();
foreach (var entity in chunk)
{
var tableOperation = TableOperation.Delete(entity);
if (batches.ContainsKey(entity.PartitionKey))
batches[entity.PartitionKey].Add(tableOperation);
else
batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
}
foreach (var batch in batches.Values)
cloudTable.ExecuteBatch(batch);
}
}
}
You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan. http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx
Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)
It does use a table scan to get the data but if run say once per day would not be too painful:
/// <summary>
/// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
/// </summary>
/// <param name="storageAccount"></param>
/// <param name="keepThreshold"></param>
public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
{
try
{
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");
TableQuery query = new TableQuery();
query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
var items = cloudTable.ExecuteQuery(query).ToList();
Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
foreach (var entity in items)
{
TableOperation tableOperation = TableOperation.Delete(entity);
if (!batches.ContainsKey(entity.PartitionKey))
{
batches.Add(entity.PartitionKey, new TableBatchOperation());
}
batches[entity.PartitionKey].Add(tableOperation);
}
foreach (var batch in batches.Values)
{
cloudTable.ExecuteBatch(batch);
}
}
catch (Exception ex)
{
Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
}
}
Here's my slightly different version of @Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.
static class LogHelper
{
/// <summary>
/// Periodically run a cleanup task for log data, asynchronously
/// </summary>
public static async void TruncateDiagnosticsAsync()
{
while ( true )
{
try
{
// Retrieve storage account from connection-string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );
// keep a weeks worth of logs
DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );
// do this until we run out of items
while ( true )
{
TableQuery query = new TableQuery();
query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
var items = cloudTable.ExecuteQuery( query ).Take( 1000 );
if ( items.Count() == 0 )
break;
Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
foreach ( var entity in items )
{
TableOperation tableOperation = TableOperation.Delete( entity );
// need a new batch?
if ( !batches.ContainsKey( entity.PartitionKey ) )
batches.Add( entity.PartitionKey, new TableBatchOperation() );
// can have only 100 per batch
if ( batches[entity.PartitionKey].Count < 100)
batches[entity.PartitionKey].Add( tableOperation );
}
// execute!
foreach ( var batch in batches.Values )
await cloudTable.ExecuteBatchAsync( batch );
Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
}
}
catch ( Exception ex )
{
Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
}
// run this once per day
await Task.Delay( TimeSpan.FromDays( 1 ) );
}
}
}
To start the process, just call this from the OnStart method in your worker role.
// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();
If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.
Slightly updated Chriseyre2000's code:
using ExecuteQuerySegmented instead of ExecuteQuery
observing TableBatchOperation limit of 100 operations
purging all Azure tables
public static void TruncateAllAzureTables(CloudStorageAccount storageAccount, DateTime keepThreshold) { TruncateAzureTable(storageAccount, "WADLogsTable", keepThreshold); TruncateAzureTable(storageAccount, "WADCrashDump", keepThreshold); TruncateAzureTable(storageAccount, "WADDiagnosticInfrastructureLogsTable", keepThreshold); TruncateAzureTable(storageAccount, "WADPerformanceCountersTable", keepThreshold); TruncateAzureTable(storageAccount, "WADWindowsEventLogsTable", keepThreshold); } public static void TruncateAzureTable(CloudStorageAccount storageAccount, string aTableName, DateTime keepThreshold) { const int maxOperationsInBatch = 100; var tableClient = storageAccount.CreateCloudTableClient(); var cloudTable = tableClient.GetTableReference(aTableName); var query = new TableQuery { FilterString = $"Timestamp lt datetime'{keepThreshold:yyyy-MM-ddTHH:mm:ss}'" }; TableContinuationToken continuationToken = null; do { var queryResult = cloudTable.ExecuteQuerySegmented(query, continuationToken); continuationToken = queryResult.ContinuationToken; var items = queryResult.ToList(); var batches = new Dictionary<string, List<TableBatchOperation>>(); foreach (var entity in items) { var tableOperation = TableOperation.Delete(entity); if (!batches.TryGetValue(entity.PartitionKey, out var batchOperationList)) { batchOperationList = new List<TableBatchOperation>(); batches.Add(entity.PartitionKey, batchOperationList); } var batchOperation = batchOperationList.FirstOrDefault(bo => bo.Count < maxOperationsInBatch); if (batchOperation == null) { batchOperation = new TableBatchOperation(); batchOperationList.Add(batchOperation); } batchOperation.Add(tableOperation); } foreach (var batch in batches.Values.SelectMany(l => l)) { cloudTable.ExecuteBatch(batch); } } while (continuationToken != null); }
精彩评论