Using regex to add leading zeroes
I would like to add a certain number of leading zeroes (say up to 3) to all numbers of a string. For example:
Input: /2009/5/song 01 of 12
Output: /2009/0005/song 0001 of 0012
What's the best way to do this with regular expressions?
Edit:
I picked the first correct ans开发者_JAVA百科wer. However, all answers are worth giving a read.
In Perl:
s/([0-9]+)/sprintf('%04d',$1)/ge;
Use something that supports a callback so you can process the match:
>>> r=re.compile(r'(?:^|(?<=[^0-9]))([0-9]{1,3})(?=$|[^0-9])')
>>> r.sub(lambda x: '%04d' % (int(x.group(1)),), 'dfbg345gf345', sys.maxint)
'dfbg0345gf0345'
>>> r.sub(lambda x: '%04d' % (int(x.group(1)),), '1x11x111x', sys.maxint)
'0001x0011x0111x'
>>> r.sub(lambda x: '%04d' % (int(x.group(1)),), 'x1x11x111x', sys.maxint)
'x0001x0011x0111x'
A sample:
>>> re.sub("(?<!\d)0*(\d{1,3})(?!\d)","000\\1","/2009/5/song 01 of 3")
'/2009/0005/song 0001 of 0003'
Note:
- It only works for numbers 1 - 9 for now
- It is not well test yet
I can't think of a single regex without using callbacks for now* (there might be a way to do it).
Here are two regular expression to process that:
>>> x = "1/2009/5/song 01 of 3 10 100 010 120 1200 abcd"
>>>
>>> x = re.sub("(?<!\d)0*(\d{1,3})(?!\d)","000\\1",x)
#'0001/2009/0005/song 0001 of 0003 00010 000100 00010 000120 1200 abcd'
>>>
>>> re.sub("0+(\d{4})(?!\d)","\\1",x) #strip extra leading zeroes
'0001/2009/0005/song 0001 of 0003 0010 0100 0010 0120 1200 abcd'
Using c#
:
string result = Regex.Replace(input, @"\d+", me =>
{
return int.Parse(me.Value).ToString("0000");
});
Another approach:
>>> x
'/2009/5/song 01 of 12'
>>> ''.join([i.isdigit() and i.zfill(4) or i for i in re.split("(?<!\d)(\d+)(?!\d)",x)])
'/2009/0005/song 0001 of 0012'
>>>
Or:
>>> x
'/2009/5/song 01 of 12'
>>> r=re.split("(?<!\d)(\d+)(?!\d)",x)
>>> ''.join(a+b.zfill(4) for a,b in zip(r[::2],r[1::2]))
'/2009/0005/song 0001 of 0012'
If your regular expression implementation does not support look-behind and/or look-ahead assertions, you can also use this regular expression:
(^|\D)\d{1,3}(\D|$)
And replace the match with $1 + padLeft($2, 4, "0") + $3
where $1
is the match of the first group and padLeft(str, length, padding)
is a function that prefixes str
with padding
until the length length
is reached.
<warning>
This assumes academic interest, of course you should use callbacks to do it clearly and correctly </warning>
I'm able to abuse regular expressions to have two leading zeros (.NET flavor):
s = Regex.Replace(s, @".(?=\b\d\b)|(?=\b\d{1,2}\b)", "$&0");
It doesn't work if there's a number in the beginning of the string. This works by matching the 0-width before a number or the character before a number, and replacing them with 0.
I had no luck expanding it to three leading zeros, and certainly not more.
The principle: Two replaces in first you add zeros front of that in the second you cut last x places. This worked for my solution to this issue in SQL. Solution of my problem that I solved.
The example: REGEXP_REPLACE(REGEXP_REPLACE(version,'.([0-9][.][0-9][.][0-9])..','\1.00000\2'),'([0-9][.][0-9][.][0-9][.]).*(.....$)','\1\2'),'.','')
this code makes the value 1.1.1.1 => 1.1.1.00001
Here is a Perl solution without callbacks or recursion. It does use the Perl regex extension of execution of code in lieu of the straight substitution (the e
switch) but this is very easily extended to other languages that lack that construct.
#!/usr/bin/perl
while (<DATA>) {
chomp;
print "string:\t\t\t$_\n";
# uncomment if you care about 0000000 case:
# s/(^|[^\d])0+([\d])/\1\2/g;
# print "now no leading zeros:\t$_\n";
s/(^|[^\d]{1,3})([\d]{1,3})($|[^\d]{1,3})/sprintf "%s%04i%s",$1,$i=$2,$3/ge;
print "up to 3 leading zeros:\t$_\n";
}
print "\n";
__DATA__
/2009/5/song 01 of 12
/2010/10/song 50 of 99
/99/0/song 1 of 1000
1
01
001
0001
/001/
"02"
0000000000
Output:
string: /2009/5/song 01 of 12
up to 3 leading zeros: /2009/0005/song 0001 of 0012
string: /2010/10/song 50 of 99
up to 3 leading zeros: /2010/0010/song 0050 of 0099
string: /99/0/song 1 of 1000
up to 3 leading zeros: /0099/0/song 0001 of 1000
string: 1
up to 3 leading zeros: 0001
string: 01
up to 3 leading zeros: 0001
string: 001
up to 3 leading zeros: 0001
string: 0001
up to 3 leading zeros: 0001
string: /001/
up to 3 leading zeros: /0001/
string: "02"
up to 3 leading zeros: "0002"
string: 0000000000
up to 3 leading zeros: 0000000000
Combined in Xcode:
targetName=[NSString stringWithFormat:@"%05d",number];
Gives 00123 for number 123
A valid Scala program to replace all groups of n digits to 4. $$
escapes the line ending char $
, because we are using StringContext (string prefixed by s).
(f/:(1 to 3)){case (res,i) =>
res.replaceAll(s"""(?<=[^\\d]|^)(\\d$i)(?=[^\\d]|$$)""", "0"*(4-i)+"$1")
}
C# version
string input = "/2009/5/song 01 of 12";
string regExPattern = @"(\/\d{4}\/)(\d+)(\/song\s+)(\d+)(\s+of\s+)(\d+)";
string output = Regex.Replace(input, regExPattern, callback =>
{
string yearPrefix = callback.Groups[1].Value;
string digit1 = int.Parse(callback.Groups[2].Value).ToString("0000");
string songText = callback.Groups[3].Value;
string digit2 = int.Parse(callback.Groups[4].Value).ToString("0000");
string ofText = callback.Groups[5].Value;
string digit3 = int.Parse(callback.Groups[6].Value).ToString("0000");
return $"{yearPrefix}{digit1}{songText}{digit2}{ofText}{digit3}";
});
In case anyone is interested in how to do this in R, the package stringr
is helpful:
library(stringr)
input<-"/2009/5/song 01 of 12"
str_replace_all(string = input,
pattern="((?<![0-9])[0-9]*([0-9]{1,3}))",
replacement=function(x){str_pad(x,width=4,side="left",pad="0")})
"/2009/0005/song 0001 of 0012"
See: https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf
精彩评论