Using Markov chains (or something similar) to produce an IRC-bot

2023-02-21 18:01 问答作者：

I tried google and found little that I could understand.

I understand Markov chains to a very basic level: It's a mathematical model that only depends on previous input to change states..so sort of a FSM with weighted random chances instead of different criteria?

I've heard that you can use them to generate semi-intelligent nonsense, given sentences of existing words to use as a dictionary of kinds.

I can't think of search terms to find this, so can anyone link me or explain how I could produce something that gives a semi-intelligent answer? (if you asked it about pie, it would not start going on about the vietnam war it had heard about)

I plan on:

Having this bot idle in IRC cha开发者_JAVA技巧nnels for a bit
Strip any usernames out of the string and store as sentences or whatever
Over time, use this as the basis for the above.

Yes, a Markov chain is a finite-state machine with probabilistic state transitions. To generate random text with a simple, first-order Markov chain:

Collect bigram (adjacent word pair) statistics from a corpus (collection of text).
Make a markov chain with one state per word. Reserve a special state for end-of-text.
The probability of jumping from state/word x to y is the probability of the words y immediately following x, estimated from relative bigram frequencies in the training corpus.
Start with a random word x (perhaps determined by how often that word occurs as the first word of a sentence in the corpus). Then pick a state/word y to jump to randomly, taking into account the probability of y following x (the state transition probability). Repeat until you hit end-of-text.

If you want to get something semi-intelligent out of this, then your best shot is to train it on lots of carefully collected texts. The "lots" part makes it produce proper sentences (or plausible IRC speak) with high probability; the "carefully collected" part means you control what it talks about. Introducing higher-order Markov chains also helps in both areas, but takes more storage to store the necessary statistics. You may also look into things like statistical smoothing.

However, having your IRC bot actually respond to what is said to it takes a lot more than Markov chains. It may be done by doing text categorization (aka topic spotting) on what is said, then picking a domain-specific Markov chain for text generation. Naïve Bayes is a popular model for topic spotting.

Kernighan and Pike in The Practice of Programming explore various implementation strategies for Markov chain algorithms. These, and natural language generation in general, is covered in great depth by Jurafsky and Martin, Speech and Language Processing.

You want to look for Ian Barber Text Generation ( phpir.com ). Unfortunately the site is down or offline. I have a copy of his text and I want to send it to you.

It seems to me you are trying multiple things at the same time:

extracting words/sentences by idling in IRC
building a knowledge base
listening to some chat, parsing keywords
generate some sentence regarding keywords

Those are basically very different tasks. Markov models are often used for machine learning. I don't see much learning in your tasks though.

larsmans answer shows how you generate sentences from word-based markov-models. You can also train the weights to favor those word-pairs that other IRC users used. But nonetheless this will not generate keyword-related sentences, because building/refining a markov model is not the same as "driving" it.

You might try hidden markov models (HMM) where the visible output is the keywords and the hidden states are made from those word-pairs. You could then favor sentences more appropriate to specific keywords dynamically.

继续阅读：artificial-intelligence markov-chains

Using Markov chains (or something similar) to produce an IRC-bot

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？