开发者

JavaScript RegExp syntax question

I'll try to better explain myself ;-).

I'm using RegexBuddy to try to find the solution. The target is JavaScript in a Konfabulator widget.

The string I need to parse is :

+++++++++++++++++++++ RUNWAY ++++++++++++++++++++++++++++++
1A1093/11  VALID:开发者_Go百科 1107140300 - 1108301500
  DAILY 0300-1500
  WIP 90M S OF RWY 08/26 AT E, W1, W2.
    NO RESTRICTION DRG TKOF/LDG OR TAX.
1A994/11  VALID: 1106201300 - 1112312059
  PAPI RWY 08 NOT OPR WHEN ILS APCH IN USE. OPR WHEN VIS APCH IN
  USE.
1A987/11  VALID: 1106190615 - UFN
  ILS DME RWY 08 BC 110.90MHZ CH46X OPR.
+++

The end result should be the following 3 sub-strings:

Substring 1)

1A1093/11  VALID: 1107140300 - 1108301500
  DAILY 0300-1500
  WIP 90M S OF RWY 08/26 AT E, W1, W2.
    NO RESTRICTION DRG TKOF/LDG OR TAX.

Substring 2)

1A994/11  VALID: 1106201300 - 1112312059
  PAPI RWY 08 NOT OPR WHEN ILS APCH IN USE. OPR WHEN VIS APCH IN
  USE.

Substring 3)

1A987/11  VALID: 1106190615 - UFN
  ILS DME RWY 08 BC 110.90MHZ CH46X OPR.

As you can see each section starts with something similar to "1A987/11 VALID:" which I am finding using this regex:

[0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:

Each section end with the "1A987/11 VALID:" of the next section or with "+++" which I am finding using this regex:

([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:)|(\+{3})

The characters in between are [\s\S]+? the "." does not work for some reason.

So the complete regex is:

[0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:[\s\S]+?(([0-9A-Z]{3,6}/\d{2}\\s{1,3}VALID:)|(\+{3}))

Now since the end of substring 1 is the beginning of substring 2, RegexBuddy does not find substring 2, only substring 1 and 3 are found.

I'm looking for a way to find all 3 substrings, hence a way to find the end of each substring but to exclude it from the string itself.


The way I read your question, the significant facts are:

  1. each match comprises two or more lines;
  2. the beginning of the first line matches the pattern you gave; and
  3. each subsequent line starts with whitespace.

Here's how I would express that as a regex:

/^[A-Z0-9]{3,6}/[0-9]{2}[ \t]+VALID:.*(\r?\n[ \t]+.*)+/mg

Notice how I used [ \t]+ instead of \s+ before the VALID: and at the beginning of the subsequent lines, to match only the horizontal whitespace characters (spaces and/or tabs). Then I used \r?\n to match the line separators (DOS-style \r\n or Unix-style \n). This way, I never match more than I need to, making the regex more efficient as well as easier to write and debug.

The m at the end turns on multiline mode, which allows the ^ anchor to match at the beginning of a line. The g turns on global mode, allowing you to find all matches, not just the first one.

By the way, the reason you had to use [\s\S] instead of . is because JavaScript has no "single-line" or "DOTALL" mode, as most other regex flavors do. There is no way to make the . match a carriage-return (\r) or linefeed (\n). But that's another thing you don't have to deal with if you match line separators explicitly, as I did.


I'm not 100% sure what your second VALID: is doing there, but I think the second part of your regular expression, after the "|" (or) where you look like you're trying to capture the "UFN" case, seems to be missing something to capture the UFN. I don't know the full range of possibilities for that sequence, or which implementation of regex you're using, but if you capture capital letters with [A-Z], you'd need that last group to be ([A-Z]{3}), or use the generic alphanumeric symbol after the slash there instead of a plus.


It depends on what language we are talking about here but the following regular expression worked for me in Perl with the s extension that treats end of lines as normal characters.

([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:.+?)([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:.+?)([0-9A-Z]{3,6}/\d{2}\s{1,3}VALID:.+?)(\+{3})

If you are trying to find some number of the VALID sections then you'd have to do a loop which depends on the language.

Notice that I collapsed the [0-9]|[A-Z] into [0-9A-Z] and basically copied the first (...) pattern 3 times.


I'm not entirely sure what regex parser you are using, but give this beast a shot:

((?:(?:[0-9]|[A-Z]){3,6}/\d{2}\s{1,3}VALID:.+?)(?=(?: \+\+\+$|(?:[0-9]|[A-Z]){3,6}/\d{2})))

It uses positive lookaheads, so it may or may not work for you.

Edit: Here is a multi-line test in JavaScript:

var match, regex = /([0-9A-Z]{3,6}\/\d{2}\s{1,3}VALID:[\s\S]+?)(?=(?: \+{3}$|(?:[0-9A-Z]{3,6}\/\d{2})))/g;
var s='+++++++++++++++++++++ RUNWAY ++++++++++++++++++++++++++++++\n\
1A1093/11  VALID: 1107140300 - 1108301500 \n\
  DAILY 0300-1500 \n\
  WIP 90M S OF RWY 08/26 AT E, W1, W2. \n\
    NO RESTRICTION DRG TKOF/LDG OR TAX. \n\
1A994/11  VALID: 1106201300 - 1112312059 \n\
  PAPI RWY 08 NOT OPR WHEN ILS APCH IN USE. OPR WHEN VIS APCH IN \n\
  USE. \n\
1A987/11  VALID: 1106190615 - UFN\n\
  ILS DME RWY 08 BC 110.90MHZ CH46X OPR. +++';

while (match=regex.exec(s)){
    alert(match[0]);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜