Censoring selected words (replacing them with ****) using a single replaceAll?

2023-01-01 21:17 问答作者：

I'd like to censor some words in a string by replacing each character in the word with a "*". Basically I would want to do

String s = "lorem ipsum dolor sit";
s = s.replaceAll("ipsum|sit", $0.length() number of *));

so that the resulting s equals "lorem ***** dolor ***".

I know how to do this with repeated replaceAll invokations, but I'm wondering, is this possible to do with a single replaceAll?

Update: It's a part of a research case-study and the reason is basically that I would like to get away with a one-liner as it simplifies the generated bytecode a bit. It's not for开发者_StackOverflow a serious webpage or anything.

Here's a modification to aioobe's answer, using nested assertions instead of nested loop to generate the assertions:

public static void main(String... args) {
    String s = "lorem ipsum dolor sit blah $10 bleh";
    System.out.println(s.replaceAll(censorWords("ipsum", "sit", "$10"), "*"));
    // prints "lorem ***** dolor *** blah *** bleh"
}
public static String censorWords(String... words) {
    StringBuilder sb = new StringBuilder();
    for (String w : words) {
        if (sb.length() > 0) sb.append("|");
        sb.append(
           String.format("(?<=(?=%s).{0,%d}).",
              Pattern.quote(w),
              w.length()-1
           )
        );
    }
    return sb.toString();
}

Some key points:

StringBuilder.append in a loop instead of String +=
Pattern.quote to escape any $ or \ in censored words

That said, this is not the best solution to the problem. It's just a fun regex game to play, really.

How it works

We want to replace with "*", so we have to match one character at a time. The question is which character.

It's the character where if you go back long enough, and then you look forward, you see a censored word.

Here's the regex in more abstract form:

(?<=(?=something).{0,N})

This matches positions where, allowing you to go back up to N characters, you can lookahead and see something.

It's possible using zero-width lookarounds:

public class Test {
    public static void main(String... args) {
        String s = "lorem ipsum dolor sit";
        System.out.println(s.replaceAll(censorWords("ipsum", "sit"), "*"));
    }

    public static String censorWords(String... words) {
        String re = "";
        for (String w : words)
            for (int i = 0; i < w.length(); i++)
                re += String.format("|((?<=%s)%s(?=%s))",
                        w.substring(0, i), w.charAt(i), w.substring(i + 1));
        return re.substring(1);
    }
}

Prints

lorem ***** dolor ***

The generated regular expression isn't pretty but it does the trick :-)

This is not a good way to censor text. Jeff Atwood has a great post about censoring in this way.

http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html

Unless you are going to spend lots and lots of time on this censoring feature it will probably end up censoring things that shouldn't be.

Another note:
Making the Java code into a 1-liner will not necessarily simplify the bytecode. Using that logic, you could throw your censoring code into a single method and then just use that.

Java's replace method doesn't take a callback as argument; so it isn't easy. But since profanity filters are mostly used on the web, I assume you can use JavaScript for that.

var s = "this is some sample text to play with";
var r = s.replace(/\b(some|sample|to)\b/g, function() {
  var star = "*";
  var len = arguments[1].length;
  while(--len)
    star += "*";
  return star;
});
console.log(r);//this is **** ****** text ** play with

继续阅读：regex

Censoring selected words (replacing them with ****) using a single replaceAll?

Related questions

How it works

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

Related questions

How it works

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？