It's probably simpler in awk, but how can I say this in Python?
I have:
Rutsch is for rutterman ramping his roe
which开发者_高级运维 is a phrase from Finnegans Wake. The epic riddle book is full of leitmotives like this, such as 'take off that white hat,' and 'tip,' all which get mutated into similar sounding words depending on where you are in the book itself. All I want is a way to find obvious occurrences of this particular leitmotif, IE
[word1] is for [word2] [word-part1]ing his [word3]
You can do this with regular expressions in Python:
import re
pattern = re.compile(r'(?P<word>.*) is for (?P=word) (?P=word)ing his (?P=word)')
words = pattern.findall(text)
That won't match your example, but it will match [word] is for [word] [word-part]ing his [word]
. Add seasoning to taste. You can find more details in the re module docs.
import re
# read the book into a variable 'text'
matches = re.findall(r'\w+ is for \w+ \w+ing his \w+', text)
This solution is for your example, not for your description: Only the first letter is alliterative:
pairs = re.findall(r'((.)\w* is for \2\w* \2\w*ing his \2\w*)', fin, re.IGNORECASE)
matches = [ p[0] for p in pairs ]
To search for cases matching your description, just replace (.) with (\w+), and remove all instances of \w*.
精彩评论