JavaScript RegExp syntax question
I'll try to better explain myself ;-).
I'm using RegexBuddy to try to find the solution. The target is JavaScript in a Konfabulator widget.
The string I need to parse is :
+++++++++++++++++++++ RUNWAY ++++++++++++++++++++++++++++++
1A1093/11 VALID:开发者_Go百科 1107140300 - 1108301500
DAILY 0300-1500
WIP 90M S OF RWY 08/26 AT E, W1, W2.
NO RESTRICTION DRG TKOF/LDG OR TAX.
1A994/11 VALID: 1106201300 - 1112312059
PAPI RWY 08 NOT OPR WHEN ILS APCH IN USE. OPR WHEN VIS APCH IN
USE.
1A987/11 VALID: 1106190615 - UFN
ILS DME RWY 08 BC 110.90MHZ CH46X OPR.
+++
The end result should be the following 3 sub-strings:
Substring 1)
1A1093/11 VALID: 1107140300 - 1108301500
DAILY 0300-1500
WIP 90M S OF RWY 08/26 AT E, W1, W2.
NO RESTRICTION DRG TKOF/LDG OR TAX.
Substring 2)
1A994/11 VALID: 1106201300 - 1112312059
PAPI RWY 08 NOT OPR WHEN ILS APCH IN USE. OPR WHEN VIS APCH IN
USE.
Substring 3)
1A987/11 VALID: 1106190615 - UFN
ILS DME RWY 08 BC 110.90MHZ CH46X OPR.
As you can see each section starts with something similar to "1A987/11 VALID:" which I am finding using this regex:
[0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:
Each section end with the "1A987/11 VALID:" of the next section or with "+++" which I am finding using this regex:
([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:)|(\+{3})
The characters in between are [\s\S]+? the "." does not work for some reason.
So the complete regex is:
[0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:[\s\S]+?(([0-9A-Z]{3,6}/\d{2}\\s{1,3}VALID:)|(\+{3}))
Now since the end of substring 1 is the beginning of substring 2, RegexBuddy does not find substring 2, only substring 1 and 3 are found.
I'm looking for a way to find all 3 substrings, hence a way to find the end of each substring but to exclude it from the string itself.
The way I read your question, the significant facts are:
- each match comprises two or more lines;
- the beginning of the first line matches the pattern you gave; and
- each subsequent line starts with whitespace.
Here's how I would express that as a regex:
/^[A-Z0-9]{3,6}/[0-9]{2}[ \t]+VALID:.*(\r?\n[ \t]+.*)+/mg
Notice how I used [ \t]+
instead of \s+
before the VALID:
and at the beginning of the subsequent lines, to match only the horizontal whitespace characters (spaces and/or tabs). Then I used \r?\n
to match the line separators (DOS-style \r\n
or Unix-style \n
). This way, I never match more than I need to, making the regex more efficient as well as easier to write and debug.
The m
at the end turns on multiline
mode, which allows the ^
anchor to match at the beginning of a line. The g
turns on global
mode, allowing you to find all matches, not just the first one.
By the way, the reason you had to use [\s\S]
instead of .
is because JavaScript has no "single-line" or "DOTALL" mode, as most other regex flavors do. There is no way to make the .
match a carriage-return (\r
) or linefeed (\n
). But that's another thing you don't have to deal with if you match line separators explicitly, as I did.
I'm not 100% sure what your second VALID: is doing there, but I think the second part of your regular expression, after the "|" (or) where you look like you're trying to capture the "UFN" case, seems to be missing something to capture the UFN. I don't know the full range of possibilities for that sequence, or which implementation of regex you're using, but if you capture capital letters with [A-Z], you'd need that last group to be ([A-Z]{3}), or use the generic alphanumeric symbol after the slash there instead of a plus.
It depends on what language we are talking about here but the following regular expression worked for me in Perl with the s
extension that treats end of lines as normal characters.
([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:.+?)([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:.+?)([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:.+?)(\+{3})
If you are trying to find some number of the VALID sections then you'd have to do a loop which depends on the language.
Notice that I collapsed the [0-9]|[A-Z]
into [0-9A-Z]
and basically copied the first (...)
pattern 3 times.
I'm not entirely sure what regex parser you are using, but give this beast a shot:
((?:(?:[0-9]|[A-Z]){3,6}/\d{2}\s{1,3}VALID:.+?)(?=(?: \+\+\+$|(?:[0-9]|[A-Z]){3,6}/\d{2})))
It uses positive lookaheads, so it may or may not work for you.
Edit: Here is a multi-line test in JavaScript:
var match, regex = /([0-9A-Z]{3,6}\/\d{2}\s{1,3}VALID:[\s\S]+?)(?=(?: \+{3}$|(?:[0-9A-Z]{3,6}\/\d{2})))/g;
var s='+++++++++++++++++++++ RUNWAY ++++++++++++++++++++++++++++++\n\
1A1093/11 VALID: 1107140300 - 1108301500 \n\
DAILY 0300-1500 \n\
WIP 90M S OF RWY 08/26 AT E, W1, W2. \n\
NO RESTRICTION DRG TKOF/LDG OR TAX. \n\
1A994/11 VALID: 1106201300 - 1112312059 \n\
PAPI RWY 08 NOT OPR WHEN ILS APCH IN USE. OPR WHEN VIS APCH IN \n\
USE. \n\
1A987/11 VALID: 1106190615 - UFN\n\
ILS DME RWY 08 BC 110.90MHZ CH46X OPR. +++';
while (match=regex.exec(s)){
alert(match[0]);
}
精彩评论