Java - Reg. Ex. File Question
I'm grabbing lines from a text file and sifting line by line using regular expressions. I'm trying to search for blank lines, meaning nothing or just whitespace.
However, what exactly is empty space? I know that whitespace is \s but what is a line that is nothing at all? null (\0)? newline (\n)?
I tried the test harness开发者_开发问答 in the Java tutorial to try and test to see what an empty space is but no luck so far.
An empty string ""
is a string. It's not null
. It doesn't have any character, not even \0
(which is just a character in Java, i.e. it's not a string terminator (JLS 10.9)).
The following are all true:
"" != null
"" instanceof String
"".contains("")
The following are true exclusively for an empty string:
"".matches("")
"".matches("^$")
"".length() == 0
"".isEmpty()
This is also true for an empty string as well as all other strings containing only whitespaces:
"".matches("\\s*");
This is because *
is zero-or-more repetition of a pattern. Zero repetition of a whitespace is an empty string.
The following is also true for all strings containing only whitespaces:
s.trim().isEmpty()
Further discussions
I notiched that
\s*
detects one or more whitespaces. How do I make it so that it detects only whitespace? For example"test test"
would be invalid?
\s*
matches zero or more whitespaces, and "test test".matches("\\s*")
is false
.
However, you can find
\s*
in "test test"
, just as you can find it in any string, because \s*
can match the empty string, and all strings contains("")
.
Figured it out...
^\s*[^a-zA-Z0-9\W]|^$
[^a-zA-Z0-9\W]
doesn't really make any sense, and in fact "_".matches("^\\s*[^a-zA-Z0-9\\W]|^$")
.
Perhaps the confusion is because matches
in Java needs to match the whole string (i.e. as if you've surrounded the entire pattern with ^
and $
), so you can drop the anchors for matches
but you'd need it for, say find
. The proper regex for such methods would then be "^\\s*$"
, with the anchors explicitly included.
The following is an excerpt from cletus's original answer (which is now deleted):
Pattern p = Pattern.compile("^\\s*$", Pattern.MULTILINE);
Matcher m = p.matcher(fileString);
while (m.find()) {
...
}
The Pattern.MULTILINE
allows ^
and $
to also match line terminators within fileString
.
I usually use Apache Commons StringUtils -class. It has nice isEmpty()
and isBlank()
methods that handle also nulls nicely:
Checks if a String is empty ("") or null.
StringUtils.isEmpty(null) = true
StringUtils.isEmpty("") = true
StringUtils.isEmpty(" ") = false
StringUtils.isEmpty("bob") = false
StringUtils.isEmpty(" bob ") = false
.
Checks if a String is whitespace, empty ("") or null.
StringUtils.isBlank(null) = true
StringUtils.isBlank("") = true
StringUtils.isBlank(" ") = true
StringUtils.isBlank("bob") = false
StringUtils.isBlank(" bob ") = false
精彩评论