Convert CSV to XML when CSV contains both character and number data
From this thread, I got the basic info on how to parse CSV to create XML. Unfortunately, the text fiel开发者_如何学Pythonds (all enclosed in quotes) sometimes contain commas, so line.split(',') gives me too many columns. I can't figure out how to parse the CSV so line.split(',') distinguishes between commas within a text field, and commas separating fields. Any thoughts on how to do that?
Thanks!
Go grab this code: http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx
Then replace line.Split(",") with SplitCSV(line), like:
var lines = File.ReadAllLines(@"C:\text.csv");
var xml = new XElement("TopElement",
lines.Select(line => new XElement("Item",
SplitCSV(line)
.Select((column, index) => new XElement("Column" + index, column)))));
xml.Save(@"C:\xmlout.xml");
Note that the code at the link above is rather old, and probably could be cleaned up a bit using Linq, but it should do the trick.
Try FileHelpers.
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
What about using the pipe character "|"? This often happens with CSV files and a better approach is to seperate on pipes.
If your CSV files are too complex to make writing your own parser practical, use another parser. The Office ACE OLEDB provider may already be available on your system, but may be overkill for your purposes. I haven't used any of the lightweight alternatives, so I can't speak to their suitability.
Here's a little trick if you don't want to use Regex. Instead of spliting with comma you can split with comma and quotes together ","
Assuming there's no space before and after comma:
line.Split("\",\"")
You will need to remove the quote before the first field and after the last field however.
While I'm almost always against regular expression, here's a solution using it.
Assume you have data as such:
"first name","last name","phone number"
"john,jane","doe","555-5555"
Then, the following code:
string csv = GetCSV(); // will load your CSV, or the above data
foreach (string line in csv.Split('\n'))
{
Console.WriteLine("--- Begin record ---");
foreach (Match m in Regex.Matches(line, "\".+?\""))
Console.WriteLine(m.Value);
}
will output this:
--- Begin record ---
"first name"
"last name"
"phone number"
--- Begin record ---
"john,jane"
"doe"
"555-5555"
But I would not recommend the Regex
approach if you have like a 2 GB csv file.
So you can use that as your baseline for making up your XML records.
精彩评论