Censoring selected words (replacing them with ****) using a single replaceAll?
I'd like to censor some words in a string by replacing each character in the word with a "*". Basically I would want to do
String s = "lorem ipsum dolor sit";
s = s.replaceAll("ipsum|sit", $0.length() number of *));
so that the resulting s
equals "lorem ***** dolor ***"
.
I know how to do this with repeated replaceAll
invokations, but I'm wondering, is this possible to do with a single replaceAll
?
Update: It's a part of a research case-study and the reason is basically that I would like to get away with a one-liner as it simplifies the generated bytecode a bit. It's not for开发者_StackOverflow a serious webpage or anything.
Here's a modification to aioobe's answer, using nested assertions instead of nested loop to generate the assertions:
public static void main(String... args) {
String s = "lorem ipsum dolor sit blah $10 bleh";
System.out.println(s.replaceAll(censorWords("ipsum", "sit", "$10"), "*"));
// prints "lorem ***** dolor *** blah *** bleh"
}
public static String censorWords(String... words) {
StringBuilder sb = new StringBuilder();
for (String w : words) {
if (sb.length() > 0) sb.append("|");
sb.append(
String.format("(?<=(?=%s).{0,%d}).",
Pattern.quote(w),
w.length()-1
)
);
}
return sb.toString();
}
Some key points:
StringBuilder.append
in a loop instead ofString +=
Pattern.quote
to escape any$
or\
in censored words
That said, this is not the best solution to the problem. It's just a fun regex game to play, really.
Related questions
- codingBat plusOut using regex
How it works
We want to replace with "*"
, so we have to match one character at a time. The question is which character.
It's the character where if you go back long enough, and then you look forward, you see a censored word.
Here's the regex in more abstract form:
(?<=(?=something).{0,N})
This matches positions where, allowing you to go back up to N
characters, you can lookahead and see something
.
It's possible using zero-width lookarounds:
public class Test {
public static void main(String... args) {
String s = "lorem ipsum dolor sit";
System.out.println(s.replaceAll(censorWords("ipsum", "sit"), "*"));
}
public static String censorWords(String... words) {
String re = "";
for (String w : words)
for (int i = 0; i < w.length(); i++)
re += String.format("|((?<=%s)%s(?=%s))",
w.substring(0, i), w.charAt(i), w.substring(i + 1));
return re.substring(1);
}
}
Prints
lorem ***** dolor ***
The generated regular expression isn't pretty but it does the trick :-)
This is not a good way to censor text. Jeff Atwood has a great post about censoring in this way.
http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html
Unless you are going to spend lots and lots of time on this censoring feature it will probably end up censoring things that shouldn't be.
Another note:
Making the Java code into a 1-liner will not necessarily simplify the bytecode. Using that logic, you could throw your censoring code into a single method and then just use that.
Java's replace method doesn't take a callback as argument; so it isn't easy. But since profanity filters are mostly used on the web, I assume you can use JavaScript for that.
var s = "this is some sample text to play with";
var r = s.replace(/\b(some|sample|to)\b/g, function() {
var star = "*";
var len = arguments[1].length;
while(--len)
star += "*";
return star;
});
console.log(r);//this is **** ****** text ** play with
精彩评论