开发者

How can I change Regex search in java to overlook case

How can I change the following cod开发者_Python百科e so it will not care about case?

public static String tagValue(String inHTML, String tag)
            throws DataNotFoundException {
        String value = null;
        Matcher m = null;

        int count = 0;
        try {

        String searchFor = "<" + tag + ">(.*?)</" + tag + ">";

        Pattern pattern = Pattern.compile(searchFor);

         m = pattern.matcher(inHTML);


            while (m.find()) {
                count++;


                return inHTML.substring(m.start(), m.end());
                // System.out.println(inHTML.substring(m.start(), m.end()));
            }
        } catch (Exception e) {
            throw new DataNotFoundException("Can't Find " + tag + "Tag.");
        }

        if (count == 0) {
            throw new DataNotFoundException("Can't Find " + tag + "Tag.");

        }

        return inHTML.substring(m.start(), m.end());

    }


Give the Pattern.CASE_INSENSITIVE flag to Pattern.compile:

String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
m = pattern.matcher(inHTML);

(Oh, and consider parsing XML/HTML instead of using a regular expression to match a nonregular language.)


You can also compile the pattern with the case-insensitive flag:

Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);


First, read Using regular expressions to parse HTML: why not?

To answer your question though, in general, you can just put (?i) at the beginning of the regular expression:

String searchFor = "(?i)" + "<" + tag + ">(.*?)</" + tag + ">";

The Pattern Javadoc explains

Case-insensitive matching can also be enabled via the embedded flag expression (?i).

Since you're using Pattern.compile you can also just pass the CASE_INSENSITIVE flag:

String searchFor = "<" + tag + ">(.*?)</" + tag + ">";

Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);

You should know what case-insensitive means in Java regular expressions.

By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.

It looks like you're matching tags, so you only want US-ASCII.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜