开发者

Searching in an unordered log file

Where I work we have a log file which contains lines like this:

31201007061308000000161639030001

Which is to be read like this:

31|year(4)|month(4)|day(2)|hour(2)|min(2)|000000|facility(3)|badge(5)|0001

So there's supposed to be a line for each record, but happens stuff like this:

31201007192000000000161206930004
31201007192001000000161353900004
31201031201007192004000000161204690004
31201007192004000000090140470004
31201007192005000000090148140004
3120100719200500031201007191515000000161597180001
31201007191700000000161203490001
31201007191700000000161203490001
312010071917000000001612028300开发者_开发技巧01
31201007191700000000

That's because the software that's supposed to read the file, sometimes it misses some of the newests records and the guy in charge copies the older records to the end of the file. So basically it's like that because of human mistakes.

When a record isn't saved in the DB I have to search the file. At first I did just a cicle that went through every record on the file, but it's really slow and the problems mentioned above made it slower. The approach I have right now is with a Regular Expression and it's like this:

//Starts Reader
StreamReader reader = new StreamReader(path);
string fileLine = reader.ReadLine();
while (!reader.EndOfStream)
{
  //Regex Matcher
  Regex rx = new Regex(@"31\d\d\d\d\d\d\d\d\d\d\d\d000000161\d\d\d\d\d0001");

  //Looks for all valid lines
  MatchCollection matches = rx.Matches(fileLine);

  //Compares each match against what we are looking for
  foreach (Match m in matches)
  {
    string s = m.Value;
    compareLine(date, badge, s);
  }

  reader.ReadLine();
}
reader.Close(); //Closes reader

My question is this: What's a good way to search through the file? Should I order/clean it first?


You'd probably be best off following these steps:

  • Parse each line into an object. A struct should be appropriate for these lines. Include a DateTime object as well as any other related fields. This can be done easily with Regex if you clean it up a bit. Use capture groups and repeaters. For a year, you can use (\d{4}) to get 4 numbers in row, instead of \d\d\d\d.
  • Create a List<MyStruct> that holds each line as an object.
  • Use LINQ to search through the list, for example:

    var searchResults = from eachEntry in MyList
                        where eachEntry.Date > DateTime.Now
                        and eachEntry.facility.Contains("003")
                        select eachEntry;

Also, add this line to your Regex, it will speed it up, if only by a few milliseconds:

MatchCollection matches = rx.Matches(fileLine, RegexOptions.Compiled);


If you know (in advance) which entry you are looking for, I.e. you exactly know the date, facility and batch you are looking for, you do not need to parse the data at all. It might be faster to generate the expected string and make a simple string search instead of using regular expressions:

string expectedValue = getExpectedValue(date, badge);
// expectedValue = "31201007192000000000161206930004"
foreach (string line in lines)
{
    if (line.IndexOf(expectedValue) >= 0)
    {
          // record found
    }
}

If you are only interested wether the file contains your id or not, you can read the complete file into a single string and search by

string completeFile = GetFileContents(file);
if (completeFile.IndexOf(expectedValue) >= 0)
{
     // record found
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜