Splitting on a Unique Character
I want to build a comma separated list so that I can split on the comma later to get an array of the values. However, the values may have comma's in them. In fact, they may have any nor开发者_StackOverflow中文版mal keyboard character in them (they are supplied from a user). What is a good strategy for determining a character you are sure will not collide with the values?
In case this matters in a language dependent way, I am building the "some character" separated list in C# and sending it to a browser to be split in javascript.
If JavaScript is consuming the list, why not send it in the form of a JavaScript array? It already has an established and reliable method for representing a list and escaping characters.
["Value 1", "Value 2", "Escaped \"Quotes\"", "Escaped \\ Backslash"]
You could split it by a null character, and terminate your list with a double null character.
I always use | but if you still think that it can contain it, you can use combinations like @|@. For example:
"string one@|@string two@|@...@|@last string"
Eric S. Raymond wrote a book chapter on this that you might find useful. It is directed toward Unix users but should still apply.
As for your question, if you will have commas within cells, then you will need some form of escaping. Using \,
is a standard way, but you will also have to escape slashes, which are also common.
Alternatively, use another character such as the pipe (|), tab, or something else of your choice. If users need to work with the data using a spreadsheet program, you can usually add filter rules to split cells on the delimiter of your choice. If this is a concern, it's probably best to choose a delimiter that users can easily type, which excludes the nul char, among others.
You could also use quoting:
"value1", "value2", "etc"
In which case, you will only need to escape quotes (and slashes). This should also be accepted by spreadsheets given the correct filter options.
There are several ways to do this. The first is to select a separator character that would not normally be input from the keyboard. NULL or TAB are normally good. The second is to use a character sequence as a separator, the Excel CSV files are a good example where the cell values are defined by quotes with commas separating the cells.
The answer is dependent on whether you want to reinvent the wheel or not.
If there is potential for any splitting character to appear in your strings then then I would suggest that you write a script element to your output with a javascript array definition in it. For example:
<script>
var myVars=new Array();
myVars[0]="abc|@123$";
myVars[1]="123*456";
myVars[2]="blah|blah";
</script>
Your javascript can then reference that array
Doing this also avoids the need to create a comma seperated string from your C# string array.
The only gotcha I can think of is strings that contains quotes, in this case you would have to escape them in C# when writing them out to the myVars output.
There is an RFC which documents the CSV format. Follow the standards and you will avoid reinventing the wheel and creating a mess for the next guy to come along and maintain your code. The nice thing is that there are libraries available to import/export CSV for just about any platform you can imagine.
That said, if you are serialising data to send to a browser, JSON is really the way to go and it too is documented in an RFC and you can get libraries for just about any platform such as JSON.NET
精彩评论