Evolutionary Algorithm to guess a string, messed up by replication
i am working on a python script to test out genetic programming. As an exercise i have made a simple Script that tries to guess a string without the whole population part.
My Code is:
# acts as a gene
# it has three operations:
# Mutation : One character is changed
# Replication: a sequencepart is duplicated
# Extinction : A sequencepart is lost
# Crossover : the sequence is crossed with another Sequence
import random
class StringGene:
def __init__(self, s):
self.sequence = s
self.allowedChars = "ABCDEFGHIJKLMOPQRSTUVWXYZ/{}[]*()+-"
def __str__(self):
return self.sequence
def Mutation(self):
x = random.randint(0, len(self.sequence)-1)
r = random.randint(0, len(self.allowedChars)-1)
d = self.sequence
self.sequence = d[:x-1]+ self.allowedChars[r] + d[x:]
def Replication(self):
x1 = random.randint(0, len(self.sequence)-1)
x2 = random.randint(0, len(self.sequence)-1)
self.sequence =self.sequence[:x1]+ self.sequence[x1:x2] + self.sequence[x2:]
self.sequence = self.sequence[:32]
def Extinction(self):
x1 = random.randint(0, len(self.sequence)-1)
x2 = random.randint(0, len(self.sequence)-1)
self.sequence = self.sequence[:x1] + self.sequence[x2:]
def CrossOver(self, 开发者_开发问答s):
x1 = random.randint(0, len(self.sequence)-1)
x2 = random.randint(0, len(s)-1)
self.sequence = self.sequence[:x1+1]+ s[x2:]
#x1 = random.randint(0, len(self.sequence)-1)
#self.sequence = s[:x2 ] + self.sequence[x1+1:]
if __name__== "__main__":
import itertools
def hamdist(str1, str2):
if (len(str2)>len(str1)):
str1, str2 = str2, str1
str2 = str2.ljust(len(str1))
return sum(itertools.imap(str.__ne__, str1, str2))
g = StringGene("Hi there, Hello World !")
g.Mutation()
print "gm: " + str(g)
g.Replication()
print "gr: " + str(g)
g.Extinction()
print "ge: " + str(g)
h = StringGene("Hello there, partner")
print "h: " + str(h)
g.CrossOver(str(h))
print "gc: " + str(g)
change = 0
oldres = 100
solutionstring = "Hello Daniel. Nice to meet you."
best = StringGene("")
res = 100
print solutionstring
while (res > 0):
g.Mutation()
g.Replication()
g.Extinction()
res = hamdist(str(g), solutionstring)
if res<oldres:
print "'"+ str(g) + "'"
print "'"+ str(best) + "'"
best = g
oldres = res
else :
g = best
change = change + 1
print "Solution:" + str(g)+ " " + str(hamdist(solutionstring, str(g))) + str (change)
I have a crude hamming distance as a measure how far the solution string differs from the current one. However i want to be able to have a varying length in the guessing, so i introduced replication and deletion of parts of the string.
Now, however the string grows infinitely and the Solution String is never found. Can you point out, where i went wrong?
Can you suggest improvements?
cheers
Your StringGene
objects are mutable, which means that when you do an operation like best = g
, you are making both g
and best
reference the same object. Since after that first step you only have a single object, every mutation gets applied permanently, whether or not it's successful, and all comparisons between g
and best
are comparisons between the same object.
You either need to implement a copy operator, or make instances immutable, and have each mutation operator return a modified version of the 'gene'.
Also, if the first mutation fails to improve the string, you set g
to best
, which is an empty string, throwing away your starting string entirely.
Finally, the canonical test string is "Methinks it is like a weasel".
The simplest thing might be to limit how long the guessed string is allowed to be. Don't allow guesses above a certain length.
I had a look at your code and I'm not good enough in Python to find any bugs, but it might be that you're simply referencing or indexing the array incorrectly, resulting in always adding new characters to the guess-string, so your string is always increasing in length... I don't know if that's the bug, but things like that have happened to me before, so double-check your array indicies. ;)
I think your fitness function is too simple. I would play with using two variables, one the size distance and the other your "hamdist". The further the size difference is, the more it effects the total fitness. So add the two together with some percentage constant.
I'm also not very familiar with python, but it looks to me that this is not what you're doing.
First of all, what you are doing is a genetic algorithm, not genetic programming (which is a related, but a different concept).
I don't know Python, but it looks you have a major problem in your extinction function. As far as I can tell, if x1 > x2 it causes the string to increase in size instead of decreasing (the part between x1 and x2 is effectively doubled). What would happen in the replication function when x1 > x2, I can't tell without knowing Python.
Also keep in mind, that maintaining a population is key to effectively solving problems with genetic algorithms. Crossovers are the essential part of the algorithm, and they make little or no sense if they are not made between population members (also, the more varied the population is, the better, most of the time). The code you presented is dependant on mutations of a single specimen to achieve your expected result, and thus highly unlikely to produce anything useful faster than a simple brute force method.
精彩评论