what data structure to use for
We have a messaging system where one module sends some messages to anoth开发者_开发问答er remote module at a high rate. The receiving module decodes this message in a specific format and forwards it to two threads. One is called the logger thread and other is the forwarder thread.
Before we send this message to these threads we need to do some kind of grouping of these messages.
Please note that these messages are coming at a high rate approx 800 per second.
The alert structure is as follows:
- INT type
- INT Sending System ID
- INT Recpt System ID
- INT timestamp
- INT codes
- INT Source Port
- INT Destination Port
- Source IP Address (ipv4 or ipv6)
- Destination IP Address (ipv4 or ipv6)
At the end of the match we need to maintain a structure with the following details
struct{
INT COUNT
INT First Alert Timestamp
INT Last Alert Timestamp
INT First Alert ID
INT Last Alert ID
}
For each alert which matches the 8 criterias, a group will be created/picked and the count will be incremented along with the other details.
The IP Address fields can be either a structure of 5 fields (INT Address Type, INT Address1, INT Address2, INT Address3 and INT Address 4) or it can be converted to string and then stored in the structure.
We have been rattling our heads for quite sometime but were unable to find a structure or algo efficient enough so that the memory and speed both can be addressed.
Hence thought of coming to you experts for help.
A double linked list to store the matched Alerts. Makes it easy to retrieve the first and last AlertID. You might will neeed to extend the double linked list to have a count field.
Depending on your performance requirements you could group the Alerts from a list with a hash on the identifiers. And if that isn't fast enough implement a more complex tree structure that groups by the identifying fields.
The best thing I can suggest is get it working in the most simple way possible, 800 per second is nothing. If you then have performance issues, then optimize. So much fun writing stuff like that using test driven development, beats the hell out of your average crud code!
What do you plan on writing this in? Any suggestion is going to depend heavily on the language.
Your best best is to start off with something like a Dictionary<string, ContainerObject>
where the key consists of the needed parameters concatenated for quick lookups. Keep working with this dictionary in memory while you have another processes logging the values appropriately to say a DB or flat file.
Keep it simple and 800 a sec shouldn't be a problem. However the means of communication is going to be a major factor. Is this local or remote? if it's remote and coming from a single source your nemesis is going to be latency building up if it's done in individual requests.
精彩评论