How to parse the CSV downloaded from Google Insights using C#
I'm downloading a CSV from Google Insights, and I need to parse out certain information and use that data to populate a heat map.
Google doesn't have an open API for Insights, so you can only download the CSV then parse it out.
There's a lot of data that gets downloaded, but the data I need starts around row 61 and goes on for about 40 rows and the data looks like this:
...
... above data
....
Top subregions for test
Subregion test
New York 开发者_如何学JAVA 100
Ohio 79
Kentucky 72
Maine 66
New Jersey 64
District of Columbia 58
Pennsylvania 58
Delaware 58
Maryland 57
Massachusetts 52
I'm able to load the CSV - I'm just not sure how to parse out that particular data properly. I looped through the CSV until finding the "subregion" text - but after that I'm not sure how to then pul out the state and count into a dictionary of some kind.
Any help would be greatly appreciated.
Thanks!
class Program
{
static void Main()
{
foreach (var item in GetRegions("google_insights.txt"))
{
Console.WriteLine("Count = {0}, Name = {1}", item.Value, item.Key);
}
}
private static Regex _regionRegex = new Regex(
@"^(?<name>.+)\s(?<count>[0-9]+)$",
RegexOptions.Compiled
);
static IEnumerable<KeyValuePair<string, int>> GetRegions(string filename)
{
using (var file = File.OpenRead(filename))
using (var reader = new StreamReader(file))
{
string line;
bool yielding = false;
while ((line = reader.ReadLine()) != null)
{
if (yielding && string.IsNullOrWhiteSpace(line)) //IsNullOrEmpty works as well
{
yield break;
}
if (yielding)
{
var match = _regionRegex.Match(line);
if (match.Success)
{
var count = int.Parse(match.Groups["count"].Value);
var name = match.Groups["name"].Value;
yield return new KeyValuePair<string, int>(name, count);
}
}
if (line.Contains("subregions"))
{
yielding = true;
}
}
}
}
}
I strongly suggest that you look into TextFieldParser. Also, see the "Related" questions to the right.
what you pasted above doesn't look like CSV format, as in, where are the commas? For CSV parsing, search for CSV regex on stackoverflow, there are a few really good suggestions. But if your data looks like you pasted above (it is separated by spaces and/or tabs, not commas) if all you want is iterate over your data and populate a dictionary you can do something like this:
Dictionary<string, int> data = new Dictionary<string,int>();
string line = null;
while ((line = ReadLine()) != null) /*ReadLine() is what you currently use to read next line from your input*/
{
string[] items = line.Split(new char[]{' ', '\t'}, StringSplitOptions.RemoveEmptyEntries);
string state= items[0].
int count = int.Parse(items[1]);
data.Add(state, count);
}
精彩评论