Matching two or three words after Different Arabic Regex Patterns in Java
Greetings All;
I am a beginner in using regex. What I want to do is to extract 2 or 3 arabic words after a certain pattern.
for example:
If I have an arabic string
inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
I need to extract the names after
الدكتور
and
والدكتورة
so the output shall be:
احمد زويل
سميرة موسى
what i have done so far is the following:
inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
Pattern pattern = Pattern.compile("(?<=الدكتور).*");
Matcher matcher = pattern.matcher(inputtext);
boolean found = false;
while (matcher.find()) {
// Get the matching string
String match = matcher.group();
System.out.println("the match is: "+match);
found = true;
}
if (!found)
{
System.out.println("I didn't found the text");
}
but it returns:
احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية
I don't know how to add another pattern and how to stop after 2 words?
开发者_StackOverflowWould you please help me with any ideas?
To match only the following two words try this one:
(?<=الدكتور)\s[^\s]+\s[^\s]+
.*
will match everything till the end of the string so that is not what you want
\s
is a whitespace character
[^\s]
is a negated character group, that will match anything but a whitespace
So my solution will match a whitespace, then at least one non whitespace (the first word), then again a whitespace and once more at least one non whitespace (the second word).
To match your second pattern I would just do a second regex (just exchange the part inside the lookbehind) and match this pattern in a second step. The regular expression is easier to read that way.
Or you can try this
(?<=الدكتور)\s[^\s]+\s[^\s]+|(?<=والدكتورة)\s[^\s]+\s[^\s]+
精彩评论