Simplifying for-if messes with better structure? [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this questionPlease, move this question to Code Review -area. It is better suited there because I know the code below is junk and I want critical feedback to complete rewrite. I am pretty much reinventing the wheel.
# Description: you are given a bitwise pattern and a string
# you need to find the number of times the pattern matches in the string.
# The pattern is determined by markov chain.
# For simplicity, suppose the ones and zeros as unbiased coin flipping
# that stops as it hits the pattern, below.
#
# Any one liner or simple pythonic solution?
import random
def matchIt(yourString, yourPattern):
"""find the number of times yourPattern occurs in yourString"""
count = 0
matchTimes = 0
# How can you simplify the for-if structures?
# THIS IS AN EXAMPLE HOW NOT TO DO IT, hence Code-Smell-label
# please, read clarifications in [Update]
for coin in yourString:
#return to base
if count == len(pattern):
matchTimes = matchTimes + 1
count = 0
#special case to return to 2, there could be more this type of conditions
#so this type of if-conditionals are screaming for a havoc
if count == 2 and pattern[count] == 1:
count = count - 1
#the work horse
#it could be simpler by breaking the intial string of lenght 'l'
#to blocks of pattern-length, the number of them is 'l - len(pattern)-1'
if coin == pattern[count]:
count=count+1
average = len(yourString)/matchTimes
return [average, matchTimes]
# Generates the list
myString =[]
for x in range(10000):
myString= myString + [int(random.random()*2)]
pattern = [1,0,0]
result = matchIt(myString, pattern)
print("The sample had "+str(result[1])+" matches and its size was "+str(len(myString))+".\n" +
"So it took "+str(result[0])+" steps in average.\n" +
"RESULT: "+str([a for a in "FAILURE" if result[0] != 8]))
# Sample Output
#
# The sample had 1656 matches and its size was 10000.
# So it took 6 steps in average.
# RESULT: ['F', 'A', 'I', 'L', 'U', 'R', 'E']
[Update]
I will explain here a bit about theory, perhaps, the problem can be simplified that way. The above code try to construct the markov chain with transition matrix A
below. The pattern 100
that you can imagine as coin flipping corresponds to it.
>>> Q=numpy.matrix('0.5 0.5 0; 0 0.5 0.5; 0 0.5 0')
>>> I=numpy.identity(3)
>>> I
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> Q
matrix([[ 0.5, 0.5, 0. ],
[ 0. , 0.5, 0.5],
[ 0. , 0.5, 0. ]])
>>> A=numpy.matrix('0.5 0.5 0 0; 0 0.5 0.5 0; 0 0.5 0 0.5; 0 0 0 1')
>>> A
matrix([[ 0.5, 0.5, 0. , 0. ],
[ 0. , 0.5, 0.5, 0. ],
[ 0. , 0.5, 0. , 0.5],
[ 0. , 0. , 0. , 1. ]])
The average
8
in the question becomes the sum of values on the first row in the matrix N=(I-Q)^-1
where Q
above.
>>> (I-Q)**-1
matrix([[ 2., 4., 2.],
[ 0., 4., 2.],
[ 0., 2., 2.]])
>>> numpy.sum(((I-Q)**-1)[0])
8.0
Now, you can probably see that this apparently-only-pattern-matching-problem becomes a markov chain. I cannot see a reason why you could not substitute the messy for-while-if conditions with something similar to matrices or matrices. I don't know how to implement them but iterators could be a way to go, researching, particularly with开发者_JAVA技巧 more states where you need to decompose.
But a problem emerged with Numpy, what are the things -Inf
and NaN
for? Check the values to which they should converge, above, from (I-Q)**-1
matrix. The N
is from N=I+Q+Q^2+Q^3+...=\frac{I-Q^{n}}{I-Q}
.
>>> (I-Q**99)/(I-Q)
matrix([[ 2.00000000e+00, 1.80853571e-09, -Inf],
[ NaN, 2.00000000e+00, 6.90799171e-10],
[ NaN, 6.90799171e-10, 1.00000000e+00]])
>>> (I-Q**10)/(I-Q)
matrix([[ 1.99804688, 0.27929688, -Inf],
[ NaN, 1.82617188, 0.10742188],
[ NaN, 0.10742188, 0.96679688]])
def matchIt(yourString, yourPattern):
"""find the number of times yourPattern occurs in yourString"""
Are you allowed to use the following?
yourString.count(yourPattern)
In your case you could create myString
as a real string of 10000 characters and the pattern
also as a string and then count the occurence in a simple pythonic way.
EDIT
A one-liner that gives you the number of (overlapping) occurences of pattern
in text
(which can be either a string or a list), could look like this:
nbOccurences = sum(1 for i in xrange(len(text)-len(pattern)) if text[i:i+len(pattern)] == pattern)
Ok - a standard(-ish) string search:
def matchIt(needle, haystack):
"""
@param needle: string, text to seek
@param haystack: string, text to search in
Return number of times needle is found in haystack,
allowing overlapping instances.
Example: matchIt('abab','ababababab') -> 4
"""
lastSeenAt = -1
timesSeen = 0
while True:
nextSeen = haystack.find(needle, lastSeenAt+1)
if nextSeen==-1:
return timesSeen
else:
lastSeenAt = nextSeen
timesSeen += 1
but you want to do this to a list of numbers? No problem; we just need to make a list class with a find() method, like so:
import itertools
class FindableList(list):
def find(self, sub, start=None, end=None):
"""
@param sub: list, pattern to look for in self
@param start: int, first possible start-of-list
If not specified, start at first item
@param: end: int, last+1 possible start-of-list
If not specified, end such that entire self is searched
Returns;
Starting offset if a match is found, else -1
"""
if start is None or start < 0:
start = 0
# N.B. If end is allowed to be too high,
# zip() will silently truncate the list comparison
# and you will probably get extra spurious matches.
lastEnd = len(self) - len(sub) + 1
if end is None or end > lastEnd:
end = lastEnd
rng = xrange if xrange else range
iz = itertools.izip
isl = itertools.islice
for pos in rng(start, end):
if all(a==b for a,b in iz(sub, isl(self, pos, end))):
return pos
# no match found
return -1
then the example looks like
matchIt([1,2,1,2], FindableList([1,2,1,2,1,2,1,2,1,2])) -> 4
and your code becomes:
# Generate a list
randIn = lambda x: int(x*random.random())
myString =[randIn(2) for i in range(10000)]
pattern = [1,0,0]
result = matchIt(pattern, myString)
print("The sample had {0} matches and its size was {1}.\n".format(result, len(myString)))
This is not ready.
Similar question but main focus on graph libraries here and similar question but in C#, maybe useful.
The files that are relevant to this question are ./networkx/generators/degree_seq.py
(997 lines, about generating graps with given degree sequence) and ./networkx/algorithms/mixing.py (line 20, function degree_assortativity(G) about probability based graphs)
and also note that its source code refer to 92 references, not sure whether you want to reinvent the wheel. For igraph, please, read the line 835 of the file convert.c
about weighted edges. You can get the source for Networkx here and the source for igraph here. Please, note that the former is under BSD license and done in Python while igraph under GNU (GPL) and done in C.
To get started with Networkx, a helpful line about creating weighted graph from its jUnits test_convert_scipy.py
-file:
def create_weighted(self, G):
g = cycle_graph(4)
e = g.edges()
source = [u for u,v in e]
dest = [v for u,v in e]
weight = [s+10 for s in source]
ex = zip(source, dest, weight)
G.add_weighted_edges_from(ex)
return G
So to create your Markov Chain, help about directed weighted graph here, something like this perhaps:
>>> DG=nx.DiGraph()
>>> DG.add_weighted_edges_from([(0,0,0.5),(1,1,0.5),(3,3,1),(0,1,0.5),(1,2,0.5),(2,3,0.5), (2,1,0.5)])
or perhaps there is some ready markov chain generation tool as there are for some other stochastic processes, more here. Cannot find an algorithm to analyze the graph with excepted value or do trials with different sets as in your example, perhaps there is none, and you must stick with other repliers' solutions.
精彩评论