开发者

Validating items in CSV with regex

I have a CSV string that I am trying to validate via regex to ensure it only has N items. I've tried the following pattern (which look for 2 items):

/([^,]+){2}/

But it doesn't seem to work, I am guessing because the inner pattern isn't greedy enough.

Any ideas? Ideally it should work with both the PHP and Javscript regex engines.

Update:

For technical reasons I really want to do this via regex rather than another solut开发者_如何学Pythonion. The CSV is not quoted and the values will not contain commas, so that isn't a problem.

/([^,]*[,]{1}[^,]*){1}/

Is where I am at now, which sort of works but is still a bit ugly, and has issues matching one item.

CSV looks like:

apples,bananas,pears,oranges,grapefruit


In PHP, you'll be much better off using this function:

http://www.php.net/manual/en/function.str-getcsv.php

It will deal with the likes of:

a,"b,c"

... which contains two items rather than three.

I'm not aware of an equivalent function for javascript.


Untested, because I don't know what your input looks like:

/^([^,]+,){1}([^,]+$)/

This requires two fields (one comma, so no comma after the last field).


How about using the g (global) modifier to make the RegExp greedier?

var foobar = 'foo,bar',
    foobarbar = 'foo,bar,"bar"',
    foo = 'foo,',
    bar = 'bar';
foo.match(/([^,]+)/g).length === 2; //=> false
bar.match(/([^,]+)/g).length === 2; //=> false
foobar.match(/([^,]+)/g).length === 2; //=> true
foobarbar.match(/([^,]+)/g).length === 2; //=> false


var vals       = "something,sthelse,anotherone,woohoo".split(','),
    maxlength = 4;

return vals.length<=maxlength

should work in js.


Depending on how the CSV is formatted, it may be able to split on /\",\"/ (i.e. double_quote comma double_quote) and get the length of the resulting array.

Regular expressions aren't very good for parsing, so if the string is complex you may need to parse it some other way.


Got it.

/^([^,]+([,]{1}|$)){1}$/

Set the last {N} to the quantity of results or range {1,3} to check.


Take a look at this answer.

To quote:

re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^                                   # Anchor to start of string.
\s*                                 # Allow whitespace before value.
(?:                                 # Group for value alternatives.
  '[^'\\]*(?:\\[\S\s][^'\\]*)*'     # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"     # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*    # or Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Allow whitespace after value.
(?:                                 # Zero or more additional values
  ,                                 # Values separated by a comma.
  \s*                               # Allow whitespace before value.
  (?:                               # Group for value alternatives.
    '[^'\\]*(?:\\[\S\s][^'\\]*)*'   # Either Single quoted string,
  | "[^"\\]*(?:\\[\S\s][^"\\]*)*"   # or Double quoted string,
  | [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*  # or Non-comma, non-quote stuff.
  )                                 # End group of value alternatives.
  \s*                               # Allow whitespace after value.
)*                                  # Zero or more additional values
$                                   # Anchor to end of string.
"""

Or the usable form (since JS can't handle multi-line regex strings):

var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;

It can be called using RegEx.test()

if (!re_valid.test(text)) return null;

The first match looks for valid single-quoted strings. The second match looks for valid double-quoted strings, the third looks for unquoted strings.

If you remove the single-quote matches it is an almost 100% implementation of a working IETF RFC 4810 spec CSV validator.

Note: It might be 100% but I can't remember whether it can handle newline chars in values (I think the [\S\s] is a javascript-specific hack to check for newline chars).

Note: This is a JavaScript-only implementation, there are no guarantees that the RegEx source string will work in PHP.

If you're planning on doing anything non-trivial with CSV data, I suggest you adopt an existing library. It gets pretty ugly if you're looking for a RFC-compliant implementation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜