开发者

Java: How to think about Modelling a Markov Chain?

I have a program tha开发者_运维问答t I am trying to make a Markov text generator for. I plan on splitting some text up at a set interval and then storing that into a class. The problem that I don't know how to solve is how to handle naming the instances of the class I am going to make. I was planning on generating the instances in a for loop. The user will pass the method some amount of text (the length of which is not known beforehand). Pseudo-code below:

    create vector for sets and tail letter;
for (int c = 0; c < text.length; c++) {
    Check to make sure overflow doesnt happen;
    Create instance of set named c;
    store set and tailLetter into vector;
}

public class set {
    String characters;
    char tailLetter;
}

I'm sorry if that's not clear enough. I'm teaching myself Java and this is my first post here.


If you are learning Java, I'd suggest that you first focus on how to model the problem with Java's classes and methods.

A Markov Chain is a model or statistical elaboration of the seed text, right? Using it to model a text, it normally describes how often each word is followed by each other word. (normally you'd split the text on word boundaries). That feels like it needs a class; it might be called MarkovChain.

Within the MarkovChain class, you need something to hold each word that occurs in the text, and maps that word to the other words in the text, and the count of frequency of those other words.

Suppose the word is 'and'. In the text, 'and' is followed by "the" four times, and "then" 3 times. So you'd need some data structure to hold something like this:

 and --> 
        the (4)
        then (3) 

One way to do this is to use an ArrayList to hold all words, then a Map<T1,T2> that holds the relationship between words and the frequency of following words. In this case T1 is probably a string, and the T2 is probably an ArrayList of pairs - a string and the (integer) count for that string.

But wait, now you don't need the base ArrayList<> to store the words, because they are just the keys in the map.

...and so on. The next step would be to figure out how to populate that data structure. That's probably an internal (private) method that gets called when a caller instantiates the MarkovChain class with a seed text.

Probably you also want that MarkovChain class to expose another method, a public one, that callers invoke when they want to generate some random sequence from the chain, relying on probabilities based on the frequency counts.

...

This is just one way to think about the modelling of the problem.

Anyway I would focus on that modelling/design exercise, before writing code.


Can't you use a Map<String, Set> where the key is the generated name?


You can use an ArrayList to manage the instances. I like the Map idea better so you can dynamically set the names instead of trying to access instances by an index number.


I don't see the point of the names:

  • If they are just so that 'set' objects will have some distinct String for debugging, the default toString() implementation will give you that.

  • If you specifically need to do lookup of these 'set' objects, then a numeric identifier or a sequence number will work better.

If you explained the purpose of the names, and how you intend to use them, maybe we could give you better advice.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜