20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

"Don't think of the overwhelming majority of the impossible."

"Don't think about the impossible"

"Grew up your bliss and the world."

"Try to win your own happiness and the world"

"What we would end create, creates the ground and you are the one to warm it"

"The creation we want to end has created the earth, only you hold warmth"

"Look and give up in miracles"

"Looking up to miracles, giving up fantasy"

But in fact, these chicken soup sentences above are all computer generated, and the program used to generate chicken soup text is less than 20 lines of Python code.

Figure 0: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

When it comes to natural language generation, people usually think that it must be a very advanced AI system that uses very advanced mathematical knowledge. But that is not the case. In this article, I (author  Ramtin Alami -Translator's Note) will use Markov chains and a small chicken soup data set to generate a new chicken soup text.

Markov chain

Markov chain is a random model that can predict an event separately based on previous events. To give a simple example, let's explain it with the life state transition of my master cat. My cat is always eating, sleeping, or playing with toys. She sleeps most of the time, but occasionally wakes up to eat. Usually, after the meal, she will be more energetic, start playing with toys, go back to sleep after playing enough, and then wake up to eat.

Figure 1: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

The Markov chain can easily simulate the life state of my cat's owner, because she will decide what to do next based on the previous state. She usually doesn't go to play with toys directly after waking up, but after eating, there is a high probability that she will go to play for a while. These life state transitions can also be expressed in the form of graphs:

Figure 2: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

Each cycle is a life state, the arrow refers to the next life state, and the number next to the arrow refers to the possibility of her changing from one state to another. We can see that the possibility of state transition is basically only based on the previous life state.

Use Markov chain to generate text

Using Markov chain to generate text also adopts the same concept, trying to find the probability of one word appearing after another word. In order to confirm the possibility of these conversions, we use some example sentences to train the model.

For example, we use the following sentences to train the model :

I like to eat apples.  You eat oranges.

From the above two training sentences, we can conclude that "I" (I), "like" and "eat" always appear in the same order, while "you" and "eat" "(Eat) is always connected. But "orange" (orange) and "apples" (apples) appear after the word "eat" with equal probability. The following conversion chart can better show the bunch I mentioned above:

Figure 3: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

These two training sentences can generate two new sentences, but this is not always the case. I trained another model with the following four sentences, and the results were quite different:

My friend makes the best raspberry pies in town.  I think apple pies are the best pies.  Steve thinks Apple makes the best computers in the world.  I have two computers, they are not Apple computers, because I am neither Steve nor big money (I own two computers and they're not apple because I am not steve or rich).

The conversion graph of the model trained with these four sentences will be much larger.

Although the chart looks very different from a typical Markov chain conversion chart, the main idea behind the two is the same.

Figure 4: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

The path starting from the starting node will randomly select the next words, all the way to the terminal node. The width of the connecting path between words indicates the probability of the word being selected.

Although only four sentences are used for training, the above model can generate hundreds of different sentences.

Figure 5: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

Code

The code of the above text generator is very simple, except for the random module of Python, no additional modules or libraries are needed. The code consists of two parts, one for training and the other for generation.

training

The training code constructs a model that we will use to generate chicken soup sentences later. I used a dictionary as a model, which contains some words as key points, and a list of possible following words as corresponding values. For example, the dictionary of the model trained with the two sentences "I like to eat apples" ('I like to eat apples') and "You eat oranges" (You eat oranges) would look like this:

{'START': ['i', 'you'], 'i': ['like'], 'like': ['to'], 'to': ['eat'], 'you': ['eat'], 'eat': ['apples', 'oranges'], 'END': ['apples', 'oranges']}

 

We don't need to calculate the probability of follow words, because if they have a higher probability, they will appear multiple times in the list of possible follow words. For example, if we want to add another training sentence "we eat apples" ('we eat apples'), the word "apples" ('apples') has appeared after the word "eat" in two sentences, then it The probability of occurrence will be very high. In the dictionary of this model, if it appears twice in the "eat" list, it has a higher probability of occurrence.

{'START': ['i', 'we', 'you'], 'i': ['like'], 'like': ['to'], 'to': ['eat'], 'you': ['eat'], 'we': ['eat'], 'eat': ['apples', 'oranges', 'apples'], 'END': ['apples', 'oranges', 'apples']}

 

In addition, there are two terms in the above model dictionary: "START" and "END" (END), which represent the beginning and ending words of a generated sentence.

for line in dataset_file:
    line = line.lower().split()
    for i, word in enumerate(line):
        if i == len(line)-1:   
            model['END'] = model.get('END', []) + [word]
        else:    
            if i == 0:
                model['START'] = model.get('START', []) + [word]
            model[word] = model.get(word, []) + [line[i+1]] 

 

Generate chicken soup sentence

The generator part contains a loop. It first selects a random starting word and adds it to a list, and then searches a list of potential following words in the dictionary, randomly selects a list, and adds the newly selected word to the list. The generator will always select random potential following words until it finds the ending word, and then it will stop the loop and output the generated sentence or the so-called "famous quote".

import random 

generated = []
while True:
    if not generated:
        words = model['START']
    elif generated[-1] in model['END']:
        break
    else:
        words = model[generated[-1]]
    generated.append(random.choice(words))

 

I used Markov chain to generate a lot of chicken soup text, but as a text generator, you can enter any text and make it generate similar sentences .

Another cool thing you can do with the Markov chain text generator is to mix different text types. For example, in my favorite TV series "Rick and Morty", there is a character called "Abradolf Lincler" (Abradolf Lincler) which uses "Abraham Lincoln" and "Adolf Hitler". A mixture of people s names.

You can also do this, enter the names of some celebrities into the Markov chain, and let it generate interesting mixed character names, (for example...

Figure 6: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

Gorda Statham

Figure 7: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning

Nicholas Zhao Si

You can even go one step further and mix some famous quotes from Lincoln and Hitler's speeches with Markov chains to generate new style speeches.

Markov chain can be applied in almost all fields. Although text generation is not the most useful application, I do find this application very interesting. What if the chicken soup text you produce will one day attract more fans than Mi Meng?

Figure 8: 20 lines of Python code to implement an intelligent generator of chicken soup sentences capable of machine learning