Java RegEx - Regular expression to split a paragraph with start and end
I am new to java regex.Please help me. Consider the below paragraph,
Paragraph :
Name abc
sadghsagh
hsajdjah Name
ggggggggg
!!!
Name ggg
dfdfddfdf Name
!!!
Name hhhh
sahdgashdg Name
asjdhjasdh
sadasldkalskd
asdjhakjsdhja
!!!
i need to split the above paragraph as blocks of text starting with Name and ending with !!! . Here I dont want to use !!! as the only delimiter to split the paragraph.I need to include the starting sequence (Name) also in my regex.
ie., my result api should looks like SplitAsBlocks("Paragraph","startswith Name","endswith !!!")
How to achieve this ,please anyone help me ...
Now i want the same output as Brito given ...but here i have added Name after "hsajdjah".Here it split the text as beow :
Name
ggggggggg
!!!
but i need
Name abc
sadghsagh
hsajdjah Name
ggggggggg
!!!
that is i have to match up Name which is at the starting of the line ,not in the middle .
please suggest me ...
Bart ...see the below input case for your code ...
i need to split the following using ur API with parameter start => Name and end => ! But the output varies ..i have only 3 blocks starts with Name and ends with ! . i have attached the output also .
String myInput = "Name hhhhh class0"+ "\n"+
"HHHHHHHHHHHHHHHHHH"+ "\n"+
"!"+ "\n"+
"Name TTTTT TTTT"+ "\n"+
"GGGGGG UUUUU IIII"+ "\n"+
"!"+ "\n"+
"Name JJJJJ WWWW"+ "\n"+
"IIIIIIIIIIIIIIIIIIIII"+ "\n"+
"!"+ "\n"+
开发者_开发技巧 "RRRRRRRRRRR TTTTTTTT"+ "\n"+
"HHHHHH"+ "\n"+
"JJJJJ 1 Name class1"+ "\n"+
"LLLLL 5 Name class5"+ "\n"+
"!"+ "\n"+
"OOOOOO HHHH FFFFFF"+ "\n"+
"service 0 Name class12"+ "\n"+
"!"+ "\n"+
"JJJJJ YYYYYY 3/0"+ "\n"+
"KKKKKKK"+ "\n"+
"UUU UUU UUUUU"+ "\n"+
"QQQQQQQ"+ "\n"+
"!";
String[] tokens = tokenize(myInput, "Name", "!");
int n = 0;
for(String t : tokens) {
System.out.println("---------------------------\n"+(++n)+"\n"+t);
}
OutPut :
---------------------------
1
Name hhhhh class0
HHHHHHHHHHHHHHHHHH
!
---------------------------
2
Name TTTTT TTTT
GGGGGG UUUUU IIII
!
---------------------------
3
Name JJJJJ WWWW
IIIIIIIIIIIIIIIIIIIII
!
---------------------------
4
Name class1
LLLLL 5 Name class5
!
---------------------------
5
Name class12
!
Here i need to have only the Name at the starting of the line not at the middle ... How to add regex for this ...
Try:
import java.util.*;
import java.util.regex.*;
public class Main {
public static String[] tokenize(String text, String start, String end) {
// old line:
//Pattern p = Pattern.compile("(?s)"+Pattern.quote(start)+".*?"+Pattern.quote(end));
// new line:
Pattern p = Pattern.compile("(?sm)^"+Pattern.quote(start)+".*?"+Pattern.quote(end)+"$");
Matcher m = p.matcher(text);
List<String> tokens = new ArrayList<String>();
while(m.find()) {
tokens.add(m.group());
}
return tokens.toArray(new String[]{});
}
public static void main(String[] args) {
String text = "Name abc" + "\n" +
"sadghsagh" + "\n" +
"hsajdjah Name" + "\n" +
"ggggggggg" + "\n" +
"!!!" + "\n" +
"Name ggg" + "\n" +
"dfdfddfdf Name" + "\n" +
"!!!" + "\n" +
"Name hhhh" + "\n" +
"sahdgashdg Name" + "\n" +
"asjdhjasdh" + "\n" +
"sadasldkalskd" + "\n" +
"asdjhakjsdhja" + "\n" +
"!!!";
String[] tokens = tokenize(text, "Name", "!!!");
int n = 0;
for(String t : tokens) {
System.out.println("---------------------------\n"+(++n)+"\n"+t);
}
}
}
String s = "Name abc sadghsagh hsajdjah !!! Name ggg dfdfddfdf !!! Name hhhh sahdgashdg asjdhjasdh sadasldkalskd asdjhakjsdhja !!!!! ";
String startsWith = "Name";
String endsWith = "!!!";
// non-greedily get all groups starting with Name and ending with !!!
String pattern = String.format("(%s).*?(%s)", Pattern.quote(startsWith), Pattern.quote(endsWith));
System.out.println(pattern);
Matcher m = Pattern.compile(pattern, Pattern.DOTALL).matcher(s);
while (m.find())
System.out.println(m.group());
output:
(\QName\E).*?(\Q!!!\E)
Name abc sadghsagh hsajdjah !!!
Name ggg dfdfddfdf !!!
Name hhhh sahdgashdg asjdhjasdh sadasldkalskd asdjhakjsdhja !!!
The following should also do if you want to keep both Name
and !!!
in the results.
String[] parts = string.split("(?=(Name|!!!))");
Edit: here's the corrected version:
String[] parts = string.split("(?<=!!!)\\s*(?=Name)");
This will split on any whitespace between !!!
and Name
and nothing else; hereby keeping the both parts. If you don't want to split on !!!Name
, then replace \\s*
by \\s+
to allow a one-to-many match instead of zero-to-many match.
Edit2: attached an example of the input/output. Input is copied from the topicstart:
String string = "Name hhhhh class0" + "\n" + "HHHHHHHHHHHHHHHHHH" + "\n" + "!" + "\n"
+ "Name TTTTT TTTT" + "\n" + "GGGGGG UUUUU IIII" + "\n" + "!" + "\n"
+ "Name JJJJJ WWWW" + "\n" + "IIIIIIIIIIIIIIIIIIIII" + "\n" + "!" + "\n"
+ "RRRRRRRRRRR TTTTTTTT" + "\n" + "HHHHHH" + "\n" + "JJJJJ 1 Name class1" + "\n"
+ "LLLLL 5 Name class5" + "\n" + "!" + "\n" + "OOOOOO HHHH FFFFFF" + "\n"
+ "service 0 Name class12" + "\n" + "!" + "\n" + "JJJJJ YYYYYY 3/0" + "\n" + "KKKKKKK"
+ "\n" + "UUU UUU UUUUU" + "\n" + "QQQQQQQ" + "\n" + "!";
String[] parts = string.split("(?<=!)\\s*(?=Name)");
for (String part : parts) {
System.out.println(part);
System.out.println("---------------------------------");
}
Output:
Name hhhhh class0
HHHHHHHHHHHHHHHHHH
!
---------------------------------
Name TTTTT TTTT
GGGGGG UUUUU IIII
!
---------------------------------
Name JJJJJ WWWW
IIIIIIIIIIIIIIIIIIIII
!
RRRRRRRRRRR TTTTTTTT
HHHHHH
JJJJJ 1 Name class1
LLLLL 5 Name class5
!
OOOOOO HHHH FFFFFF
service 0 Name class12
!
JJJJJ YYYYYY 3/0
KKKKKKK
UUU UUU UUUUU
QQQQQQQ
!
---------------------------------
Looks fine?
精彩评论