How to make this random text generator more efficient in Python?

2023-01-14 07:40 问答作者：

I'm working on a random text generator -without using Markov chains开发者_开发问答- and currently it works without too many problems. Firstly, here is my code flow:

Enter a sentence as input -this is called trigger string, is assigned to a variable-
Get longest word in trigger string
Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-
Return the longest sentence that has the word I spoke about in step 3
Append the sentence in Step 1 and Step4 together
Assign the sentence in Step 4 as the new 'trigger' sentence and repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-

And here is my code:

import nltk
from nltk.corpus import gutenberg
from random import choice

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str
longestLength = 0
longestString = ""
listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of  list format-
listOfWords = gutenberg.words()# all words in gutenberg books -list format-

while triggerSentence:
    #so this is run every time through the loop
    split_str = triggerSentence.split()#split the sentence into words

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)

    #code to get the sentences containing the longest word, then selecting
    #random one of these sentences that are longer than 40 characters
    sets = []
    for sentence in listOfSents:
        if sentence.count(longestString):
            sents= " ".join(sentence)
            if len(sents) > 40:
            sets.append(" ".join(sentence))

    triggerSentence = choice(sets)
    print triggerSentence

My concern is, the loop mostly reaches to a point where the same sentence is printed over and over again. Since it is the longest sentence that has the longest word. To counter getting the same sentence over and over again, I thought of the following:

*If the longest word in the current sentence is the same as it was in the last sentence, simply delete this longest word from the current sentence and look for the next longest word.

I tried some implementations for this but failed to apply the solution above since it involves lists and list of lists -due to words and sentences from gutenberg module-. Any suggestions about how to find the second longest word ? I seem to be unable to do this with parsing a simple string input since .sents() and .words() functions of NLTK's Gutenberg module yield list of list and list respectively. Thanks in advance.

Some suggested improvements:

The while loop will run forever, you should probably remove it.
Use max and generator expressions to generate the longest word in a memory-efficient manner.
You should generate a list of sentences with a length greater than 40 characters that include longestWord with a list comprehension. This should also be removed from the while loop, as it only happens.

sents = [" ".join(sent) for sent in listOfSents if longestWord in sent and len(sent) > 40]
If you want to print out every sentence that is found in a random order, then you could try shuffling the list you just created:

for sent in random.shuffle(sents): print sent

This is how the code could look with these changes:

import nltk
from nltk.corpus import gutenberg
from random import shuffle

listOfSents = gutenberg.sents()
triggerSentence = raw_input("Please enter the trigger sentence: ")

longestWord = max(triggerSentence.split(), key=len)
longSents = [" ".join(sent) for sent in listOfSents 
                 if longestWord in sent 
                 and len(sent) > 40]

for sent in shuffle(longSents):
    print sent

If all you need is generate random text (I guess, with requirement that it should contain meaningful sentences) you can do it much simpler: Just generate random numbers and use them as index to retrieve sentences from your text database (be it Project Gutenberg or whatever).

继续阅读：nltk python random text

How to make this random text generator more efficient in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？