How can I change Regex search in java to overlook case
How can I change the following cod开发者_Python百科e so it will not care about case?
public static String tagValue(String inHTML, String tag)
throws DataNotFoundException {
String value = null;
Matcher m = null;
int count = 0;
try {
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor);
m = pattern.matcher(inHTML);
while (m.find()) {
count++;
return inHTML.substring(m.start(), m.end());
// System.out.println(inHTML.substring(m.start(), m.end()));
}
} catch (Exception e) {
throw new DataNotFoundException("Can't Find " + tag + "Tag.");
}
if (count == 0) {
throw new DataNotFoundException("Can't Find " + tag + "Tag.");
}
return inHTML.substring(m.start(), m.end());
}
Give the Pattern.CASE_INSENSITIVE
flag to Pattern.compile
:
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
m = pattern.matcher(inHTML);
(Oh, and consider parsing XML/HTML instead of using a regular expression to match a nonregular language.)
You can also compile the pattern with the case-insensitive flag:
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
First, read Using regular expressions to parse HTML: why not?
To answer your question though, in general, you can just put (?i)
at the beginning of the regular expression:
String searchFor = "(?i)" + "<" + tag + ">(.*?)</" + tag + ">";
The Pattern Javadoc explains
Case-insensitive matching can also be enabled via the embedded flag expression
(?i)
.
Since you're using Pattern.compile
you can also just pass the CASE_INSENSITIVE
flag:
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
You should know what case-insensitive means in Java regular expressions.
By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.
It looks like you're matching tags, so you only want US-ASCII.
精彩评论