Java: How to think about Modelling a Markov Chain?

2023-03-01 15:20 问答作者：

I have a program tha开发者_运维问答t I am trying to make a Markov text generator for. I plan on splitting some text up at a set interval and then storing that into a class. The problem that I don't know how to solve is how to handle naming the instances of the class I am going to make. I was planning on generating the instances in a for loop. The user will pass the method some amount of text (the length of which is not known beforehand). Pseudo-code below:

    create vector for sets and tail letter;
for (int c = 0; c < text.length; c++) {
    Check to make sure overflow doesnt happen;
    Create instance of set named c;
    store set and tailLetter into vector;
}

public class set {
    String characters;
    char tailLetter;
}

I'm sorry if that's not clear enough. I'm teaching myself Java and this is my first post here.

If you are learning Java, I'd suggest that you first focus on how to model the problem with Java's classes and methods.

A Markov Chain is a model or statistical elaboration of the seed text, right? Using it to model a text, it normally describes how often each word is followed by each other word. (normally you'd split the text on word boundaries). That feels like it needs a class; it might be called MarkovChain.

Within the MarkovChain class, you need something to hold each word that occurs in the text, and maps that word to the other words in the text, and the count of frequency of those other words.

Suppose the word is 'and'. In the text, 'and' is followed by "the" four times, and "then" 3 times. So you'd need some data structure to hold something like this:

 and --> 
        the (4)
        then (3)

One way to do this is to use an ArrayList to hold all words, then a Map<T1,T2> that holds the relationship between words and the frequency of following words. In this case T1 is probably a string, and the T2 is probably an ArrayList of pairs - a string and the (integer) count for that string.

But wait, now you don't need the base ArrayList<> to store the words, because they are just the keys in the map.

...and so on. The next step would be to figure out how to populate that data structure. That's probably an internal (private) method that gets called when a caller instantiates the MarkovChain class with a seed text.

Probably you also want that MarkovChain class to expose another method, a public one, that callers invoke when they want to generate some random sequence from the chain, relying on probabilities based on the frequency counts.

...

This is just one way to think about the modelling of the problem.

Anyway I would focus on that modelling/design exercise, before writing code.

Can't you use a Map<String, Set> where the key is the generated name?

You can use an ArrayList to manage the instances. I like the Map idea better so you can dynamically set the names instead of trying to access instances by an index number.

I don't see the point of the names:

If they are just so that 'set' objects will have some distinct String for debugging, the default toString() implementation will give you that.
If you specifically need to do lookup of these 'set' objects, then a numeric identifier or a sequence number will work better.

If you explained the purpose of the names, and how you intend to use them, maybe we could give you better advice.

继续阅读：class-design data-structures model

Java: How to think about Modelling a Markov Chain?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？