开发者

Converting tag contents to lipsum with regex [duplicate]

This question already has answers here: RegEx match open tags except XHTML self-contained tags (35 answers) Closed 9 years ago.

I'm debranding a micro-site to use as a portfolio piece. It's built with static html, I need to replace the contents of every non-script tag with lipsum or even scrambled text - but it has to be the same number of characters as the current text to keep the formatting nice. Furthermore, I really would rather do this with GUI grep editor rather than writing a script because there may be a few tags I need to keep the contents of.

I used the regex \>([^$]+?)\< to find them (all the scripts start with $ so it skips the script tag) but I can't find any way to count the number of characters 开发者_StackOverflowmatched and replace with a corresponding number of lipsum or random characters.

Thanks for any help!


I was able to successfully do this, though I had to end up using a Java program. Turns out regex is fine cause I'm not parsing the whole thing, just a few parts. There are a few quirks but this got the job done.

public class Debrander {

public static void main(String[] args) {

       // reads in html from StdIn
       String htmlPage = StdIn.readAll();

       //regex matches all content within non-script non-style tags
       Pattern tagContentRegex = Pattern.compile("\\>(.*?)\\<(?!/script)(?!/style)");
       Matcher myMatcher = tagContentRegex.matcher(htmlPage);

       //different regex to check for whitespace
       Pattern whiteRegex = Pattern.compile("[^\\s]");

       StringBuffer sb = new StringBuffer();

       LoremIpsum4J loremIpsum = new LoremIpsum4J();
       loremIpsum.setStartWithLoremIpsum(false);

       //loop through all matches
       while(myMatcher.find()){
           String tagContent = htmlPage.substring(myMatcher.start(1), myMatcher.end(1));
           Matcher whiteMatcher = whiteRegex.matcher(tagContent);
           //whiteMatcher makes sure there is a NON-WHITESPACE character in the string
           if (whiteMatcher.find()){
               Integer charCount = (myMatcher.end(1) - myMatcher.start(1));

               String[] lipsum = loremIpsum.getBytes(charCount);
               String replaceString = ">";

               for (int i=0; i<lipsum.length; i++){
                   replaceString += lipsum[i];
               }
               replaceString += "<";
               myMatcher.appendReplacement(sb, replaceString);
           }
       }
       myMatcher.appendTail(sb);
       StdOut.println(sb.toString());
   }

}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜