Parsing large delimited files with dynamic number of columns

2022-12-29 04:12 问答作者：

What would be the best approach to parse a delimited file when the columns are unknown before parsing the file?

The file format is Rightmove v3 (.blm), the structure looks like this:

#HEADER#
Version : 3
EOF : '^'
EOR : '~'
#DEFINITION#
AGENT_REF^ADDRESS_1^POSTCODE1^MEDIA_IMAGE_00~ // can be any number of columns
#DATA#
agent1^the address^the postcode^an image~
agent2^the address^the postcode^^~      // the records have to have the same number of columns as specified in the definition, however they can be empty
etc
#END#

The files can potentially be very large, the example file I have is 40Mb but they could be several hundred megabytes. Below is the code I had started on before I realised the columns were dynamic, I'm ope开发者_StackOverflowning a filestream as I read that was the best way to handle large files. I'm not sure my idea of putting every record in a list then processing is any good though, don't know if that will work with such large files.

List<string> recordList = new List<string>();

try
{
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    {
        StreamReader file = new StreamReader(fs);
        string line;
        while ((line = file.ReadLine()) != null)
        {
            string[] records = line.Split('~');

            foreach (string item in records)
            {
                if (item != String.Empty)
                {
                    recordList.Add(item);
                }
            }

        }
    }
}
catch (FileNotFoundException ex)
{
    Console.WriteLine(ex.Message);
}

foreach (string r in recordList)
{
    Property property = new Property();

    string[] fields = r.Split('^');

    // can't do this as I don't know which field is the post code
    property.PostCode = fields[2];
    // etc

    propertyList.Add(property);
}

Any ideas of how to do this better? It's C# 3.0 and .Net 3.5 if that helps.

Thanks,

Annelie

If you can strip out some of the lines at the start (the header content, and the #xxx# lines) then it's just a csv file with ^ as the delimiter, so any CSV reader class will do the trick.

You could do this a few ways.

If the properties on your objects have the same name as the columns in your data file, you could use reflection to determine which columns should be matched to which properties.
If the properties on your objects have different names, then you could write a custom mapping schema that would say "for column X, assign to property Y".
You could create custom attributes for your object properties that indicate which column name they map to, and use reflection to read those attributes.

All of these steps presuppose that the column names in your data files will be the same for the data they represent (i.e., ADDRESS_1 will always be the column name for "address line one" data).

继续阅读：.net delimited large-files parsing

Parsing large delimited files with dynamic number of columns

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？