开发者

How to parse the CSV downloaded from Google Insights using C#

I'm downloading a CSV from Google Insights, and I need to parse out certain information and use that data to populate a heat map.

Google doesn't have an open API for Insights, so you can only download the CSV then parse it out.

There's a lot of data that gets downloaded, but the data I need starts around row 61 and goes on for about 40 rows and the data looks like this:

...
...  above data
....
Top subregions for test 
Subregion   test
New York   开发者_如何学JAVA 100
Ohio    79
Kentucky    72
Maine   66
New Jersey  64
District of Columbia    58
Pennsylvania    58
Delaware    58
Maryland    57
Massachusetts   52

I'm able to load the CSV - I'm just not sure how to parse out that particular data properly. I looped through the CSV until finding the "subregion" text - but after that I'm not sure how to then pul out the state and count into a dictionary of some kind.

Any help would be greatly appreciated.

Thanks!


class Program
{
    static void Main()
    {
        foreach (var item in GetRegions("google_insights.txt"))
        {
            Console.WriteLine("Count = {0}, Name = {1}", item.Value, item.Key);
        }
    }

    private static Regex _regionRegex = new Regex(
        @"^(?<name>.+)\s(?<count>[0-9]+)$", 
        RegexOptions.Compiled
    );

    static IEnumerable<KeyValuePair<string, int>> GetRegions(string filename)
    {
        using (var file = File.OpenRead(filename))
        using (var reader = new StreamReader(file))
        {
            string line;
            bool yielding = false;
            while ((line = reader.ReadLine()) != null)
            {
                if (yielding && string.IsNullOrWhiteSpace(line)) //IsNullOrEmpty works as well
                {
                    yield break;
                }

                if (yielding)
                {
                    var match = _regionRegex.Match(line);
                    if (match.Success)
                    {
                        var count = int.Parse(match.Groups["count"].Value);
                        var name = match.Groups["name"].Value;
                        yield return new KeyValuePair<string, int>(name, count);
                    }
                }

                if (line.Contains("subregions"))
                {
                    yielding = true;
                }
            }
        }

    }
}


I strongly suggest that you look into TextFieldParser. Also, see the "Related" questions to the right.


what you pasted above doesn't look like CSV format, as in, where are the commas? For CSV parsing, search for CSV regex on stackoverflow, there are a few really good suggestions. But if your data looks like you pasted above (it is separated by spaces and/or tabs, not commas) if all you want is iterate over your data and populate a dictionary you can do something like this:


Dictionary<string, int> data = new Dictionary<string,int>();
string line = null;
while ((line = ReadLine()) != null) /*ReadLine() is what you currently use to read next line from your input*/
{
 string[] items = line.Split(new char[]{' ', '\t'}, StringSplitOptions.RemoveEmptyEntries);
 string state= items[0].
 int count = int.Parse(items[1]);
 data.Add(state, count);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜