开发者

Tool or API needed to find text contain any word from a large dictionary of words

I'm looking for a tool (ideally) or failing that an API to search text for instances of any word from a large dictionary of words in a large number of text files. "Words" in my case are actually file names but won't contain spaces.

A fast algorithm might perhaps build a DFA (deterministic finite automata) by reading the dictionary and then be able to use a single pass to find instances of the dictionary words over any number of files.

Note: I'm wanting exact text matching not fuzzy matching like this SO question: - Algorithm wanted: F开发者_StackOverflow中文版ind all words of a dictionary that are similar to words in a free text


Have you looked at lucene ? There's a java and a .net version

http://lucene.apache.org/java/docs/index.html


I'd load the dictionary of words to a HashMap or "Dictionary", then read the file in line by line or word by word, checking if the map contains the word.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜