开发者

java.util.regex matching anything before expression

I tr开发者_运维百科ying to tokenize following snippets by types of numbers:

"(0-22) 222-33-44, 222-555-666, tel./.fax (111-222-333) 22-33-44 UK, TEL/faks: 000-333-444, fax: 333-444-555, tel: 555-666-888"

and

"tel: 555-666-888, tel./fax (111-222-333) 22-33-44 UK"

and

"fax (111-222-333) 22-33-44 UK, TEL/faks: 000-333-444, fax: 333-444-555"

and so on.

The conception is that this can be any combination of like "tel/faks" and "tel/fax numbers" after it or just a "tel/fax number" at the beginning of the string.

I make this:

"(?:.(?!((tel|fax|faks)[ /:.]+)+))++"

on example 1, but after find() it returns: (chars '_' were added by me)

    _(0-22) 222-33-44, 222-555-666,_

    _TEL./_

    _FAX (111-222-333) 22-33-44 UK,_

    _TEL_

    _FAKS: 000-333-444,_

    _FAX: 333-444-555_

it seems that I loosing one char in every group and combined types like "TEL/faks" are splited. I need also to grab (if this exist, if not then default number is tel) for future processing.

How can I get rid of this?

ps. I use: case-insensitive


Your regular expression means (roughly):

(?:                                 Match a group consisting of:
   .                                  any character
   (?!                                that is not followed by
      ((tel|fax|faks)[ /:.]+)+))      "tel" or "fax" or "fakx", followed by at least one
                                              punctuation character from [ /:.]
                                +   (multiple times)

That's why you get a missing character before "Tel", "Fax" etc - because your regular expression says never to match the character before "Tel", "Fax" etc.

That's also why "Tel./.faks:" gets split - because the last "." comes before "fax", so it doesn't get matched.

I would suggest constructing two regular expressions that match:

A - a telephone number (parens, digits, commas, spaces), with at least one digit
B - a telephone/fax designation ("fax", "faks", "tel", punctuation)

Then search for strings matching

B*A+
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜