Parsing large delimited files with dynamic number of columns
What would be the best approach to parse a delimited file when the columns are unknown before parsing the file?
The file format is Rightmove v3 (.blm), the structure looks like this:
#HEADER#
Version : 3
EOF : '^'
EOR : '~'
#DEFINITION#
AGENT_REF^ADDRESS_1^POSTCODE1^MEDIA_IMAGE_00~ // can be any number of columns
#DATA#
agent1^the address^the postcode^an image~
agent2^the address^the postcode^^~ // the records have to have the same number of columns as specified in the definition, however they can be empty
etc
#END#
The files can potentially be very large, the example file I have is 40Mb but they could be several hundred megabytes. Below is the code I had started on before I realised the columns were dynamic, I'm ope开发者_StackOverflowning a filestream as I read that was the best way to handle large files. I'm not sure my idea of putting every record in a list then processing is any good though, don't know if that will work with such large files.
List<string> recordList = new List<string>();
try
{
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{
StreamReader file = new StreamReader(fs);
string line;
while ((line = file.ReadLine()) != null)
{
string[] records = line.Split('~');
foreach (string item in records)
{
if (item != String.Empty)
{
recordList.Add(item);
}
}
}
}
}
catch (FileNotFoundException ex)
{
Console.WriteLine(ex.Message);
}
foreach (string r in recordList)
{
Property property = new Property();
string[] fields = r.Split('^');
// can't do this as I don't know which field is the post code
property.PostCode = fields[2];
// etc
propertyList.Add(property);
}
Any ideas of how to do this better? It's C# 3.0 and .Net 3.5 if that helps.
Thanks,
Annelie
If you can strip out some of the lines at the start (the header content, and the #xxx# lines) then it's just a csv file with ^
as the delimiter, so any CSV reader class will do the trick.
You could do this a few ways.
- If the properties on your objects have the same name as the columns in your data file, you could use reflection to determine which columns should be matched to which properties.
- If the properties on your objects have different names, then you could write a custom mapping schema that would say "for column X, assign to property Y".
- You could create custom attributes for your object properties that indicate which column name they map to, and use reflection to read those attributes.
All of these steps presuppose that the column names in your data files will be the same for the data they represent (i.e., ADDRESS_1 will always be the column name for "address line one" data).
精彩评论