Literature on spellchecker?

2023-03-09 07:59 问答作者：

I was wondering if the开发者_开发百科re's a good list of literatures on how to implement a spellchecker. One example I can find is Peter Norvig's How to write a spelling corrector - http://norvig.com/spell-correct.html very unrealistic.

Few things I am interested in are:

Constructing spellchecker without resorting to a dictionary, (either by using existing corpuses, N-gram dump such as Google NGram dump).
Contextual spellchecking.

Here's a classic paper: Church & Gale (1991). There's been less work on context-senstive error correction, but two papers probably worth looking at are Golding (1995) and Carlson & Fette (2007).

Quote from link below

How does it Work?
The Basic Model
The basic technology works as follows: The documents that the search engine is providing access to are added both to the search index and a language model. The language model stores seen phrases and maintains statistics about them. When a query is submitted, the src/QuerySpellCheck.java class looks for possible character edits, namely substitutions, insertions, replacements, transpositions, and deletions, that make the query a better fit for the lanaguage model. So if you type 'Gretski' as a query, and the underlying data is data from rec.sport.hockey, the language model will be much more familliar with the mildly edited 'Gretzky' and suggests it as an alternative.
Domain Sensitivity
The big advantage of this approach over dictionary-based spell checking is that the corrections are motivated by data in the search index. So "trt" will be corrected to "tort" in a legal domain, "tart" in a cooking domain, and "TRt" in a bio-informatics domain. On Google, there is no suggested correction, presumably because of web domains "trt.com", Thessaly Radio Television as well as Turkiye Radyo Televizyon, both aka TRT, etc.
Context-Sensitive Correction
Both Yahoo and Google perform context-sensitive correction. For instance, the query frod (an Old English term from the German meaning wise or experienced) has a suggested correction of ford (the automotive company, among others), whereas the query frod baggins has the corrected query frodo baggins (a 20th century English fictional character). That's the Yahoo behavior. Google doesn't correct frod baggins, even though there are about 785 hits for it versus 820,000 for Frodo Baggins. On the other hand, Google does correct frdo and frdo baggins. Amazon behaves similarly, but MSN corrects frd baggins to ford baggins rather than frodo baggins.
LingPipe's model supports exactly this kind of context-sensitive correction.

read this great tutorial

继续阅读：machine-learning spell-checking

Literature on spellchecker?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？