Parsing a complex version string
I have a string that's of the following scheme:
VersionNumber.VersionString-VersionNumber.VersionStrin开发者_StackOverflow社区g
Such that the following example strings can be converted into arrays of information:
1. 1.x-2.x => (1, 'x', '2', 'x')
2. 1.2-3.4 => (1, 2, 3, 4)
3. 1.2-3.4-beta5 => (1, 2, 3, '4-beta5')
4. 1.2-beta3-3.4 => (1, '2-beta3', 3, 4)
5. 1.2-beta3-4.5-beta6 => (1, '2-beta3', 4, '5-beta6')
The logic for the parse is:
- First element is everything before the first period.
- Second element is everything up to a hyphen immediately before a number.
- Third element always starts with a number and is everything up to the next period.
- Fourth element is everything after the period.
Notes:
- Second element is an arbitrary string, but will never have a hyphen that immediately precedes a number (e.g.
2-3
is not valid, but2-beta4
is). - Third element always starts with a number, and begins right after a hyphen.
I've been able to parse the first three cases using the following expression:
(.+?).(.+?)-(.+?).(.*)
But I'm not sure how to modify it to handle cases 4 and 5 (when the second element contains a hyphen). The two approaches I thought of were:
- Modify the second group to match everything before a hyphen immediately preceding a digit.
- Modify the second group to match everything until it hits a second hyphen only if the first hyphen immediately precedes a non-digit character.
Presumably the first approach is the correct/simplest way to do it, but I'm struggling with coming up with the correct regexp to express it.
Try this:
(.+?)\.(.*)-(.+?)\.(.*)
actually, even this will work:
(.*)\.(.*)-(.*)\.(.*)
Your problem was that you were not escaping the period so it was treating it as match any char rather than match a period.
UPDATE:
So if a VersionString can contain periods/hyphens, following your parse logic, this should work:
(\d*)\.(.*)-(\d*)\.(.*)
It says,
- match all numbers (matches your first VersionNumber)
- match everything between the first period and the last hyphen before a digit (thanks to the greedy match)
- match all digits after the hyphen but before the period
- match the rest
The string:
1.2-b.e.t.a.3-4.5-b.e.t.a.6 => '1' '2-b.e.t.a.3' '4' '5-b.e.t.a.6'
It also works if you go crazy with hyphens in the versionstring too:
1.2-b-e.t-a.3-4.5-b-e.t-a.6 => '1' '2-b-e.t-a.3' '4' '5-b-e.t-a.6'
Can VersionString
ever contain a dot? If not, this should work:
(\d+)\.([^.]+)-(\d+)\.(\S+)
The [^.]+
initially matches everything up to the next dot, but then backtracks a little bit. If VersionString
can contain a dot, you can use this:
(\d+)\.(\S+?)-(\d+)\.(\S+)
Matching digits explicitly in the VersionNumber
part serves to enforce your "digit preceded by a hyphen" rule.
(Actually, (.+?)
works just as well; I used (\S+?)
because I was testing the regex plucking the version strings out of the full text of your message.)
EDIT: Per the comments below, here's the final version:
(\d+[^.]*)\.(\S+?)-(\d+[^.]*)\.(\S+)
精彩评论