Is this the most efficient way to parse the string?
I have a string of the form AU 12345T
or AU 12345T1
; basically it is of the form alphabet characters(s) followed by a number then ending in a one or two character alpha-numeric string.
I am using the following regular expression to get me the r开发者_运维百科esult:
^[a-z|A-Z]+|[0-9]+|[a-z|A-Z][0-9]?
Would this be the most efficient way to parse such a string?
So for the example AU 12345T
, I want the result to be separated into three tokens: AU
, 12345
, T
; for AU 12345T1
it should be AU
, 12345
, T1
(since the ending characters can be alpha-numeric and max length is 2)
This should do it:
[A-Za-z]+\s?[0-9]+[A-Za-z0-9]{1,2}?
If you want to separate the strings as you said, put parenthesis around the blocks, like so:
([A-Za-z]+)\s?([0-9]+)([A-Za-z0-9]{1,2}?)
This will have the regex return each group individually.
All this being said, you'll probably want to ensure that the final one/two character alphanumeric string always begins with a letter, or else you'll have no way of separating the second token from the third token.
精彩评论