Converting tag contents to lipsum with regex [duplicate]

2023-03-20 04:13 问答作者：

This question already has answers here: RegEx match open tags except XHTML self-contained tags (35 answers) Closed 9 years ago.

I'm debranding a micro-site to use as a portfolio piece. It's built with static html, I need to replace the contents of every non-script tag with lipsum or even scrambled text - but it has to be the same number of characters as the current text to keep the formatting nice. Furthermore, I really would rather do this with GUI grep editor rather than writing a script because there may be a few tags I need to keep the contents of.

I used the regex \>([^$]+?)\< to find them (all the scripts start with $ so it skips the script tag) but I can't find any way to count the number of characters 开发者_StackOverflowmatched and replace with a corresponding number of lipsum or random characters.

Thanks for any help!

I was able to successfully do this, though I had to end up using a Java program. Turns out regex is fine cause I'm not parsing the whole thing, just a few parts. There are a few quirks but this got the job done.

public class Debrander {

public static void main(String[] args) {

       // reads in html from StdIn
       String htmlPage = StdIn.readAll();

       //regex matches all content within non-script non-style tags
       Pattern tagContentRegex = Pattern.compile("\\>(.*?)\\<(?!/script)(?!/style)");
       Matcher myMatcher = tagContentRegex.matcher(htmlPage);

       //different regex to check for whitespace
       Pattern whiteRegex = Pattern.compile("[^\\s]");

       StringBuffer sb = new StringBuffer();

       LoremIpsum4J loremIpsum = new LoremIpsum4J();
       loremIpsum.setStartWithLoremIpsum(false);

       //loop through all matches
       while(myMatcher.find()){
           String tagContent = htmlPage.substring(myMatcher.start(1), myMatcher.end(1));
           Matcher whiteMatcher = whiteRegex.matcher(tagContent);
           //whiteMatcher makes sure there is a NON-WHITESPACE character in the string
           if (whiteMatcher.find()){
               Integer charCount = (myMatcher.end(1) - myMatcher.start(1));

               String[] lipsum = loremIpsum.getBytes(charCount);
               String replaceString = ">";

               for (int i=0; i<lipsum.length; i++){
                   replaceString += lipsum[i];
               }
               replaceString += "<";
               myMatcher.appendReplacement(sb, replaceString);
           }
       }
       myMatcher.appendTail(sb);
       StdOut.println(sb.toString());
   }

}

继续阅读：grep regex

Converting tag contents to lipsum with regex [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？