开发者

Java: use scanner delimiter as token

I'm trying to find a good way to get a Scanner to use a given delimiter as a token. For example, I'd like to split up a piece of text into digit and non-digit chunks, so ideally I'd just set the delimiter to \D and set some flag like useDelimiterAsToken, but after briefly looking thro开发者_Python百科ugh the API I'm not coming up with anything. Right now I've had to resort to using combined lookaheads/lookbehinds for the delimiter, which is somewhat painful:

scanner.useDelimiter("((?<=\\d)(?=\\D)|(?<=\\D)(?=\\d))");

This looks for any transition from a digit to a non-digit or vice-versa. Is there a more sane way to do this?


EDIT: The edited question is so different, my original answer doesn't apply at all. For the record, what you're doing is the ideal way to solve your problem, in my opinion. Your delimiter is the zero-width boundary between a digit and a non-digit, and there's no more succinct way to express that than what you posted.

EDIT2: (In response to the question asked in the comment.) You originally asked for an alternative to this regex:

"((?<=\\w)(?=[^\\w])|(?<=[^\\w])(?=\\w))"

That's almost exactly how \b, the word-boundary construct, works:

"(?<=\\w)(?!\\w)|(?<!\\w)(?=\\w)"

That is, a position that's either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. The difference is that \b can match at the beginning and end of the input. You obviously didn't want that, so I added lookarounds to exclude those conditions:

"(?!^)\\b(?!$)"

It's just a more concise way to do what your regex did. But then you changed the requirement to matching digit/non-digit boundaries, and there's no shorthand for that like \b for word/non-word boundaries.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜