Converting tag contents to lipsum with regex [duplicate]
I'm debranding a micro-site to use as a portfolio piece. It's built with static html, I need to replace the contents of every non-script tag with lipsum or even scrambled text - but it has to be the same number of characters as the current text to keep the formatting nice. Furthermore, I really would rather do this with GUI grep editor rather than writing a script because there may be a few tags I need to keep the contents of.
I used the regex \>([^$]+?)\<
to find them (all the scripts start with $ so it skips the script tag) but I can't find any way to count the number of characters 开发者_StackOverflowmatched and replace with a corresponding number of lipsum or random characters.
Thanks for any help!
I was able to successfully do this, though I had to end up using a Java program. Turns out regex is fine cause I'm not parsing the whole thing, just a few parts. There are a few quirks but this got the job done.
public class Debrander {
public static void main(String[] args) {
// reads in html from StdIn
String htmlPage = StdIn.readAll();
//regex matches all content within non-script non-style tags
Pattern tagContentRegex = Pattern.compile("\\>(.*?)\\<(?!/script)(?!/style)");
Matcher myMatcher = tagContentRegex.matcher(htmlPage);
//different regex to check for whitespace
Pattern whiteRegex = Pattern.compile("[^\\s]");
StringBuffer sb = new StringBuffer();
LoremIpsum4J loremIpsum = new LoremIpsum4J();
loremIpsum.setStartWithLoremIpsum(false);
//loop through all matches
while(myMatcher.find()){
String tagContent = htmlPage.substring(myMatcher.start(1), myMatcher.end(1));
Matcher whiteMatcher = whiteRegex.matcher(tagContent);
//whiteMatcher makes sure there is a NON-WHITESPACE character in the string
if (whiteMatcher.find()){
Integer charCount = (myMatcher.end(1) - myMatcher.start(1));
String[] lipsum = loremIpsum.getBytes(charCount);
String replaceString = ">";
for (int i=0; i<lipsum.length; i++){
replaceString += lipsum[i];
}
replaceString += "<";
myMatcher.appendReplacement(sb, replaceString);
}
}
myMatcher.appendTail(sb);
StdOut.println(sb.toString());
}
}
精彩评论