开发者

Java RegEx Fun - Playing with Sentences

Input string:

Lorem ipsum tip. Lorem ipsum loprem ipsum septum #match this#, lorem ipsum #match this too#. #Do not match this because it is already after a period#.

Desired output:

Lorem ipsum tip. #match this# #match this too# Lorem ipsum loprem ipsum septum, lorem ipsum. #Do not match this because it is already after a period#.

Note that #match this# and #match this too# both have been moved next to the most recent period (.). Basically put, everything that is ## should be moved to the nearest period on the left.

Can RegEx and Java String handling accomplish this?

This most basic RegEx to match #anything# is this:

\#(.*?)\#

I am having difficulties beyond that.

Edit: You do not have to tell me how to write a complete program. I just need a sufficient RegEx solution and then I'll try the string manipulation on my own.

Here is my solution derived from glowcoder's answer:

public static String computeForSlashline(String input) {

   String[] sentences = input.split("\\.");

   StringBuilder paragraph = new StringBuilder();
   StringBuilder blocks = new StringBuilder();

   Matcher m;

   try {

      // Loop through sentences, split by periods. 
      for (int i = 0; i < sentences.length; i++) {

         // Find all the #____# blocks in this sentence
         m = Pattern.compile("(\\#(.*?)\\#)").matcher(sentences[i]);

         // Store all the #____# blocks in a single StringBuilder
         while (m.find()) {

            blocks.append(m.group(0));

         }

         // Place all the #____# blocks at the beginning of the sentence. 
         // Strip the old (redundant) #____# blocks from the sentence.
         paragraph.append(blocks.toString() + " " + m.replaceAll("").trim() + ". ");

         // Clear the #____# collection to make room for the next sentence.
         blocks.setLength(0);

   }

   } catch(Exception e) { System.out.println(e); return null; } 

   // Make th开发者_开发百科e paragraph look neat by adding line breaks after
   // periods, question marks and #_____#. 
   m = Pattern.compile("(\\. |\\.&nbsp;|\\?|\\])").matcher(paragraph.toString());

   return m.replaceAll("$1<br /><br />");

}

This gives me the desired output. There is one problem, however: If there is a period in between #__# (example: #Mrs. Smith kicks Ms. Smith in the sensitive spot#), the input.split("\\."); line will break up the #__#. So I will replace the input.split() line with a RegEx.


The skeleton I would use is as follows:

String computeForSlashline(String input) {

    String[] sentences = input.split("\.");
    for(int i = 0; i < sentences.length; i++) {
        // perform a search on each sentence, moving the #__# to the front
    }
    StringBuilder sb = new StringBuilder();
    for(String sentence : sentences) {
        sb.append(sentence).append(". ");
    }
    return sb.toString().trim();

}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜