Regex to split a string (in Java) so that spaces are preserved?
I need to split a string (in Java) into individual words ... but I need to preserve spaces.
An example of the text I need to split is something like this:
ABC . . . . DEF . . . . GHII need to see "ABC", " . . . .", "DEF", ". . . .", and "GHI".
Obviously splitting on the space character \s
isn't goin开发者_开发技巧g to work, as all the spaces get swallowed up as one space.
Any suggestions?
Thanks
Looks like you can just split on \b
in this case ("\\b"
as a string literal).
Generally you want to split on zero-width matching constructs, which \b
is, but also lookarounds can be used.
Related questions
- Java split is eating my characters.
Splitting based on a custom word boundary
If \b
isn't fitting your definition, you can always define your own boundaries using assertions.
For example, the following regex splits on the boundary between a meta character class X
and its complement
(?=[X])(?<=[^X])|(?=[^X])(?<=[X])
In the following example, we define X
to be \d
:
System.out.println(java.util.Arrays.toString(
"007james123bond".split(
"(?=[X])(?<=[^X])|(?=[^X])(?<=[X])".replace("X", "\\d")
)
)); // prints "[007, james, 123, bond]"
Here's another example where X
is a-z$
:
System.out.println(java.util.Arrays.toString(
"$dollar . . blah-blah $more gimme".split(
"(?=[X])(?<=[^X])|(?=[^X])(?<=[X])".replace("X", "a-z$")
)
)); // prints "[$dollar, . . , blah, -, blah, , $more, , gimme]"
Thanks guys, that gave me the lead I needed ... I'm using (?<=[\\s])
and it works exactly the way I want!
精彩评论