开发者

Matching two or three words after Different Arabic Regex Patterns in Java

Greetings All;

I am a beginner in using regex. What I want to do is to extract 2 or 3 arabic words after a certain pattern.

for example:

If I have an arabic string

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "

I need to extract the names after

الدكتور

and

والدكتورة

so the output shall be:

احمد زويل
سميرة موسى

what i have done so far is the following:

inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
Pattern pattern = Pattern.compile("(?<=الدكتور).*");
            Matcher matcher = pattern.matcher(inputtext);
            boolean found = false;
            while (matcher.find()) {
                // Get the matching string
                String match = matcher.group();
                System.out.println("the match is: "+match);
                found = true;
            }
            if (!found)
    {
        System.out.println("I didn't found the text");
    }

but it returns:

احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية

I don't know how to add another pattern and how to stop after 2 words?

开发者_StackOverflowWould you please help me with any ideas?


To match only the following two words try this one:

(?<=الدكتور)\s[^\s]+\s[^\s]+

.* will match everything till the end of the string so that is not what you want

\s is a whitespace character

[^\s] is a negated character group, that will match anything but a whitespace

So my solution will match a whitespace, then at least one non whitespace (the first word), then again a whitespace and once more at least one non whitespace (the second word).

To match your second pattern I would just do a second regex (just exchange the part inside the lookbehind) and match this pattern in a second step. The regular expression is easier to read that way.

Or you can try this

(?<=الدكتور)\s[^\s]+\s[^\s]+|(?<=والدكتورة)\s[^\s]+\s[^\s]+
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜