Log processing with LINQ
I know that it might not be the most performant, but I want to process some logs with a LINQ statement. Here is what the log looks like:
RECORD DEVON 1 6748
bla bla bla bla bla bla
bla bla bla bla bla bla
RECORD JASON 1 7436
bla bla bla bla bla bla
bla bla bla bla bla bla
RECORD DEVON 2 9123
RECORD DEVON 3 3723
RECORD SHERRIE 1 6434
RECORD DEVON 4 3732
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
RECORD SHERRIE 2 6434
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
bla bla bla bla bla bla
RECORD SHERRIE 3 9123
RECORD DEVON 5 3723
bla bla bla bla bla bla
RECORD JASON 2 9123
RECORD DEVON 6 3723
bla bla bla bla bla bla
bla bla bla bla bla bla
RECORD JASON 3 9123
Now I want to filter out anything that doesn't start with RECORD, and group it by the name column (JASON, DEVON, SHERRIE), and then cross join it by name so it looks like this:
DEVON JASON SHERRIE
1/6748 1/7436 1/6434
2/9123 2/9123 2/6434
3/3723 3/9123 3/9123
4/3732
5/3723
6/3723
Is this possible t开发者_C百科o do in a single LINQ statement?
You can get the results in rows in one go with Linq (here I'm using the method notation):
string[] lines = File.ReadAllLines("input.txt");
var result =
lines.Where(line => line.Substring(0, 6) == "RECORD")
.Select(line => line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.GroupBy(columns => columns[1],
columns => columns[2] + "/" + columns[3])
.Select(group => group.Key + " " + string.Join(", ", group.ToArray()));
Output:
DEVON 1/6748, 2/9123, 3/3723, 4/3732, 5/3723, 6/3723
JASON 1/7436, 2/9123, 3/9123
SHERRIE 1/6434, 2/6434, 3/9123
I think it's difficult to transpose the rows to columns without a standard Zip function though. Maybe this is good enough for you? If not, then you will probably have to do the last bit with a helper method that iterates over the separate IEnumerables.
Here is what I came up with:
public static string TransformLog(string fileName)
{
const string tab = "\t";
var fileLines = File.ReadAllLines(fileName);
var testAreas = fileLines
.Where(l => l.StartsWith("RECORD" + tab))
.Select(l => l.Split(tab.ToCharArray()).Skip(1).Take(3).ToArray())
.GroupBy(l => l[0])
.Select(g => new { g.Key, Enumerator = g.GetEnumerator() })
.ToList();
var sb = new StringBuilder();
testAreas.ForEach(ta => sb.Append(ta.Key + tab + tab));
sb.AppendLine();
bool cont;
do
{
cont = false;
testAreas.ForEach(ta =>
{
var hasNext = ta.Enumerator.MoveNext();
sb.Append((hasNext ? ta.Enumerator.Current[1] + tab + ta.Enumerator.Current[2] + tab : tab + tab));
cont |= hasNext;
});
sb.AppendLine();
} while (cont);
return sb.ToString();
}
精彩评论