Java RegEx Fun - Playing with Sentences
Input string:
Lorem ipsum tip. Lorem ipsum loprem ipsum septum #match this#, lorem ipsum #match this too#. #Do not match this because it is already after a period#.
Desired output:
Lorem ipsum tip. #match this# #match this too# Lorem ipsum loprem ipsum septum, lorem ipsum. #Do not match this because it is already after a period#.
Note that #match this# and #match this too# both have been moved next to the most recent period (.). Basically put, everything that is ## should be moved to the nearest period on the left.
Can RegEx and Java String handling accomplish this?
This most basic RegEx to match #anything# is this:
\#(.*?)\#
I am having difficulties beyond that.
Edit: You do not have to tell me how to write a complete program. I just need a sufficient RegEx solution and then I'll try the string manipulation on my own.
Here is my solution derived from glowcoder's answer:
public static String computeForSlashline(String input) {
String[] sentences = input.split("\\.");
StringBuilder paragraph = new StringBuilder();
StringBuilder blocks = new StringBuilder();
Matcher m;
try {
// Loop through sentences, split by periods.
for (int i = 0; i < sentences.length; i++) {
// Find all the #____# blocks in this sentence
m = Pattern.compile("(\\#(.*?)\\#)").matcher(sentences[i]);
// Store all the #____# blocks in a single StringBuilder
while (m.find()) {
blocks.append(m.group(0));
}
// Place all the #____# blocks at the beginning of the sentence.
// Strip the old (redundant) #____# blocks from the sentence.
paragraph.append(blocks.toString() + " " + m.replaceAll("").trim() + ". ");
// Clear the #____# collection to make room for the next sentence.
blocks.setLength(0);
}
} catch(Exception e) { System.out.println(e); return null; }
// Make th开发者_开发百科e paragraph look neat by adding line breaks after
// periods, question marks and #_____#.
m = Pattern.compile("(\\. |\\. |\\?|\\])").matcher(paragraph.toString());
return m.replaceAll("$1<br /><br />");
}
This gives me the desired output. There is one problem, however: If there is a period in between #__# (example: #Mrs. Smith kicks Ms. Smith in the sensitive spot#), the input.split("\\.");
line will break up the #__#. So I will replace the input.split()
line with a RegEx.
The skeleton I would use is as follows:
String computeForSlashline(String input) {
String[] sentences = input.split("\.");
for(int i = 0; i < sentences.length; i++) {
// perform a search on each sentence, moving the #__# to the front
}
StringBuilder sb = new StringBuilder();
for(String sentence : sentences) {
sb.append(sentence).append(". ");
}
return sb.toString().trim();
}
精彩评论