Number grouping using regexps
Is it possible to do number grouping (e.g., converting the number 1000
to the string "1 000"
) using one pass with only regular expressions? (I know the boundary between regexp and language facilities are a bit blurred in some systems - listen to your conscience before replying.)
Reason why I'm asking: Another dev recently asked me how to do number grouping in JavaScript, and showed me a slightly incorrect JavaScript function using regexps. I gave him a better alternative, but his regexp nagged at me because this kind of rewrite is definitely something a regular grammar should be able to do, but I really can't figure out how to write a regexp for it.
This is my first naïve attempt, which I knew would be incorrect:
function group(n) { return n.toString().replace(/(\d{3})/g, "$1 "); }
This approach has two flaws; group(1000)
yields "100 0"
, and group(100)
yields "100 "
(trailing space). You can fix it this way:
String.prototype.reverse = function () {
var a = [];
for (var i = this.length; i >= 0; --i) a.push(this[i]);
return a.join("");
};
function group(n) {
return n.toString().reverse().replace(/(\d{3})/g, "$1 ").
trimRight().reverse();
}
But this requires not one, not two, not even three but FOUR passes (two reverses, one replace, and trimRight
)! I then ventured out into look-behind land, and came开发者_Python百科 up with:
function group(n) { return n.toString().replace(/(\d{3}(?!\d))/g, " $1");
... which doesn't work at all (edit - probably because I confused look-behind and negative look-ahead...) - it only matches the last three digits (group(1000000000)
becomes "1000000 000"
). Look-ahead works a little better:
function group(n) { return n.toString().replace(/(\d{3})(?=\d)/g, "$1 "); }
Which more or less brings me back where I started - I'm rid of the trailing space, but group(1000)
still yields "100 0"
.
So - can this be done with a single regexp replace pass? I'm language agnostic, as this should only need to use regexp facilities.
Note: This is not a question about how to do localization, and I'm not engaging in premature optimization. I'm just curious as to whether this is possible, and if it isn't, why not.
Here's a version that will work in JavaScript:
return n.toString().replace(/(\d)(?=(\d{3})+(?!\d))/g, "$1 ");
This does it in Perl:
$num =~ s/(?<=\d)(\d{3})(?=(\d{3})*(\D|$))/ $1/g;
To break it down:
(?<=\d)
- we're checking our match is preceded by a digit using a lookbehind(\d{3})
- we're looking for a group of three digits(?=
- we're using a lookahead, so the three digits must be followed by something(\d{3})*
- This will match 0 or more groups of 3 digits, i.e. 0, 3, 6... digits.(\D|$)
- This will match a non-digit or the end of the string.
So we want to find a digit, followed by 3 digits, followed by 0, 3, 6... digits and then no more digits.
Unfortunately, JavaScript does not have lookbehind in its regular expressions so this pattern won't work in JavaScript. If you drop the lookbehind you get a leading space put in front of numbers with 3, 6, 9... digits.
n.toString().replace(/(\d)(?=(\d{3})+\b)/g,"$1 ")
Add a space after every digit that is followed by 3i digits. For example, in 123456789
, these digits will be matched: 2
, 6
.
Working demo: http://jsbin.com/iruzu
精彩评论