translating play in HTML to python
So, I'd like to represent one of Shakespeare's plays, Hamlet, into the following objects (maybe this isn't the best representation, if so please tell me):
class Play():
acts = []
...
def add_act(self, act): acts.append(act)
class Act():
scenes = []
...
def add_scene(self, scene): scenes.append(scene)
class Scene():
elems = []
def __init__(self, title, setting=""): ...
def add_elem(self, elem): elems.append(elem)
...
class StageDirection(): # elem
def __init__(self, text): ...
class Line(): # elem
def __init__(self, id, text, character = None): ...
# A None character represents a continuation from the previous line
# id could be, for example, 1.1.1
There are other methods, of course, for printing and such in each of the classes.
The question is, how do I get a structure based on these classes (or something like them) from HTML 4 code that looks like this:
<H3>ACT I</h3>
<h3>SCENE I. Elsinore. A platform before the castle.</h3>
<p><blockquote>
<i>FRANCISCO at his post. Enter to him BERNARDO</i>
</blockquote>
<A NAME=speech1><b>BERNARDO</b></a>
<blockquote>
<A NAME=1.1.1>Who's there?</A><br>
</blockquote>
<A NAME=speech2><b>FRANCISCO</b></a>
<blockquote>
<A NAME=1.1.2>Nay, answer me: stand, and unfold yourself.</A><br>
</blockquote>
<A NAME=speech3><b>BERNARDO</b></a>
<blockquote>
<A NAME=1.1.3>Long liv开发者_运维问答e the king!</A><br>
</blockquote>
<A NAME=speech4><b>FRANCISCO</b></a>
<blockquote>
<A NAME=1.1.4>Bernardo?</A><br>
</blockquote>
<A NAME=speech5><b>BERNARDO</b></a>
<blockquote>
<A NAME=1.1.5>He.</A><br>
</blockquote> <!-- for more, see the source of shakespeare.mit.edu/hamlet/full.html -->
translating that into something like this:
play = Play()
actI = Act()
sceneI = Scene("Scene I", "Elsinore. A platform before the castle.")
sceneI.add_elem(StageDirection("Francisco at his post. Enter to him Bernardo."))
sceneI.add_elem(Line("Bernardo", "Who's there?"))
...
Of course, I don't expect all the code—but what libraries and, when there aren't libraries, logic should I use?
Thanks.
(This is for a future opensource project and me learning Python for fun—not homework.)
Use lxml
or a similar parser. They will read your HTML (XML?) into a document tree, which is basically a more generic version of the data structure which you have written.
You can then iterate over the tree generated and prune it or rebuild another tree in memory that looks the way you want to. But the HTML -> data structure step is a solved problem.
Wait, do you want to generate the actual Python code? Why on earth would you want that?
BTW, your code won't do what you want:
class Play():
acts = []
...
def add_act(self, act): acts.append(act)
Try this instead:
class Play():
def __init__(self):
self.acts = []
...
def add_act(self, act):
self.acts.append(act)
精彩评论