JAVA使用前缀树(Tire树)实现敏感词过滤、词典搜索

2023-01-04 10:25 开发作者：萌萌哒二狗子

简介

有时候需要对用户输入的内容进行敏感词过滤，或者实现查找文本中出现的词典中的词，用遍历的方式进行替换或者查找效率非常低，这里提供一个基于Trie树的方式，进行关键词的查找与过滤，在词典比较大的情况下效率非常高。

Trie树

Trie树，又叫前缀树，多说无益，直接看图就明白了

词典：[“猪狗”, “小狗”, “小猫”, “小猪”, “垃圾”, “狗东西”]

Tire数据结构：

JAVA使用前缀树(Tire树)实现敏感词过滤、词典搜索

code

树节点Node.class

/**
 * trie tree
 *
 * @author lovely dog
 * @date 2020/10/20
 */
public class Node {
    /**
     * 子节点
     */
    private Map<Character, Node> nextNodes = new HashMap<>();

    public void addNext(Character key, Node node){
        nextNodes.put(key, node);
    }

    public Node getNext(Character key){
        return nextNodes.get(key);
    }

    public boolean isLastCharacter(){
        return nextNodes.isEmpty();
    }
}

搜索类TrieSea编程客栈rcher.class

/**
 * trie tree searcher
 *
 * @author lovely dog
 * @date 2020/10/20
 */
public class TrieSearcher {

    private Node开发者_JS教程 root = new Node();

    /**
     * 添加词
     *
     * @param word 词
     */
    public void addword(String word) {
        Node tmpNode = root;
        for (char c : word.toCharArray()) {
            Node node = tmpNode.getNext(c);
            if (null == node) {
                node = new Node();
                tmpNode.addNext(c, node);
            }
            tmpNode = node;
        }
    }

    /**
     * 替换词
     *
     * @param text         待处理文本
     * @param afterReplace 替换后的词
     * @return 处理后的文本
     */
    public String replace(String text, String afterReplace) {
        StringBuilder result = new StringBuilder(text.length());
        Node tmpNode = root;
        int begin = 0, pos = 0;
        while (pos < text.length()) {
            char c = text.charAt(pos);
            tmpNode = tmpNode.getNext(c);
            if (null == tmpNode) {
                result.append(text.charAt(begin));
                begin++;
                pos = begin;
                tmpNode = root;
            } else if (tmpNode.isLastCharacter()) {
                // 匹配完成, 进行替换
                result.append(a编程客栈fterReplace);
                pos+php+;
                begin = pos;
                tmpNode = root;
            } else {
                // 匹配上向后移
                pos++;
            }
        }
        result.append(text.substring(begin));
        return result.toString();
    }

    /**
     * 查找
     *
     * @param text 待处理文本
     * @return 统计数据 key: word value: count
     */
    public Map<String, Integer> find(String text) {
        Map<String, Integer> resultMap = new HashMap<>(16);
        Nophpde tmpNode = root;
        StringBuilder word = new StringBuilder();
        int begin = 0, pos = 0;
        while (pos < text.length()) {
            char c = text.charAt(pos);
            tmpNode = tmpNode.getNext(c);
            if (null == tmpNode) {
                begin++;
                pos = begin;
                tmpNode = root;
            } else if (tmpNode.isLastCharacter()) {
                // 匹配完成
                String w = word.append(c).toString();
                resultMap.put(w, resultMap.getOrDefault(w, 0) + 1);
                pos++;
                begin = pos;
                tmpNode = root;
                word = new StringBuilder();
            } else {
                // 匹配上向后移
                word.append(c);
                pos++;
            }
        }
        return resultMap;
    }
}

测试Main.class

public class Main {
    public static void main(String[] args) {
        TrieSearcher trieSearcher = new TrieSearcher();
        Stream.of("猪狗", "小狗", "小猫", "小猪", "垃圾", "狗东西").forEach(trieSearcher::addWord);
        String sentence = "你好，小狗，小猪，今天天气真好。";
        System.out.println(trieSearcher.replace(sentence, "***"));
        System.out.println(trieSearcher.find(sentence));
    }
}

输出：

你好，***，***，今天天气真好。
{小猪=1, 小狗=1}

Benchmark：

replace 1093.517 ns/op
trie&nhttp://www.devze.combsp; 200.042 ns/op

结论

在仅有短文本和小词典的情况下，通过性能测试可以看出前缀树的效率很高，随着文本和词典的增长，性能提升会非常明显。

到此这篇关于Java使用前缀树(Tire树)实现敏感词过滤、词典搜索的文章就介绍到这了,更多相关JAVA前缀树敏感词过滤内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们！

继续阅读：JAVA前缀树敏感词过滤

JAVA使用前缀树(Tire树)实现敏感词过滤、词典搜索

目录

简介

Trie树

code

结论

更多精彩内容

精彩评论

最新开发

使用gopacket解析协议层中的相关数据方式

golang网络数据包捕获库gopacket详解

Go语言中的错误处理过程

Go语言中new与make的使用解读

Qt使用QSqlDatabase连接MySQL实现增删改查功能

开发排行榜

springboot后端存储富文本内容的思路与步骤(含图片内容)

PyCharm运行python测试,报错“没有发现测试”/“空套件”的解决

return base64.b64encode(b).decode(

基于C语言实现钻石棋游戏的示例代码

Sublime Text 3解决中文乱码问题（实测可用）