开发者

Can someone provide a regex for validating and parsing a csv of integers and reals

I am new to regex and struggling to create an expression to parse a csv containing 1 to n values. The values can be integers or 开发者_Go百科real numbers. The sample inputs would be:

1  
1,2,3,4,5    
1,2.456, 3.08, 0.5, 7

This would be used in c#.

Thanks,

Jerry


Use a CSV parser instead of RegEx.

There are several options - see this SO questions and answers and this one for the different options (built into the BCL and third party libraries).


The BCL provides the TextFieldParser (within the VisualBasic namespace, but don't let that put you off it).

A third party library that is liked by many is filehelpers.


Using REGEX for CSV parsing has been a 10 year jihad for me. I have found it remarkably frustrating, due to the boundary cases:

Numbers come in a variety of forms (here in the US, Canada):

1
1.
1.0
1000
1000.
1,000
1e3
1.0e3
1.0e+3
1.0e+003
-1
-1.0 (etc)

But of course, Europe has traditionally been different with regard to commas and decimal points:

1
1,0
1000
1.000e3
1e3
1,0e3
1,0e+3
1,0e+003

Which just ruins everything. So, we ignore the German and French and Continental standard because the comma just is impossible to work out whether it is separating values, or part of values. (The Continent likes TAB instead of COMMA)

I'll assume that you're "just" looking for numerical values separated from each other by commas and possible space-padding. The expression:

\s*(\-?\d+(?:\.\d*)?(?:[eE][\-+]?\d*)?)\s*

is a pretty fair parser of A NUMBER. Catches just about every reasonable case. Doesn't deal with imbedded commas though! It also trims off spaces, either side of the number.

From there, you can either build an iterative CSV string decomposer (walking each field, absorbing commas, assigning to an array, say), or use the scanf type function to do the same thing. I do prefer the iterative decomposition method - as it also allows you to parse out strings, hexadecimal, and virtually any other pattern you find in the data.


The regex you want is

@"([+-]?\d+(?:\.\d+)?)(?:$|,\s*)"

...from which you'll want capture group 1. However, don't use regex for something like this. String manipulation is much better when the input is in a very static, predictable format:

string[] nums = strInput.split(", ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
List<float> results = (from n in nums
                       select float.Parse(n)).ToList();

If you do use regex, make sure you do a global capture.


I think you would have to loop it to check for an unknown number of ints... or else something like this:

/ *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) */

and you could keep that going ",?([0-9]*)" as far as you wanted to, to account for a lot of numbers. The result would be an array of numbers....

http://jsfiddle.net/8URvL/1/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜