开发者

translating play in HTML to python

So, I'd like to represent one of Shakespeare's plays, Hamlet, into the following objects (maybe this isn't the best representation, if so please tell me):

class Play():
  acts = []
  ...
  def add_act(self, act): acts.append(act)

class Act():
  scenes = []
  ...
  def add_scene(self, scene): scenes.append(scene)

class Scene():
  elems = []
  def __init__(self, title, setting=""): ...
  def add_elem(self, elem): elems.append(elem)
  ...

class StageDirection(): # elem
  def __init__(self, text): ...

class Line(): # elem
  def __init__(self, id, text, character = None): ...
  # A None character represents a continuation from the previous line
  # id could be, for example, 1.1.1

There are other methods, of course, for printing and such in each of the classes.

The question is, how do I get a structure based on these classes (or something like them) from HTML 4 code that looks like this:

<H3>ACT I</h3> 
<h3>SCENE I. Elsinore. A platform before the castle.</h3> 
<p><blockquote> 
<i>FRANCISCO at his post. Enter to him BERNARDO</i> 
</blockquote> 

<A NAME=speech1><b>BERNARDO</b></a> 
<blockquote> 
<A NAME=1.1.1>Who's there?</A><br> 
</blockquote> 

<A NAME=speech2><b>FRANCISCO</b></a> 
<blockquote> 
<A NAME=1.1.2>Nay, answer me: stand, and unfold yourself.</A><br> 
</blockquote> 

<A NAME=speech3><b>BERNARDO</b></a> 
<blockquote> 
<A NAME=1.1.3>Long liv开发者_运维问答e the king!</A><br> 
</blockquote> 

<A NAME=speech4><b>FRANCISCO</b></a> 
<blockquote> 
<A NAME=1.1.4>Bernardo?</A><br> 
</blockquote> 

<A NAME=speech5><b>BERNARDO</b></a> 
<blockquote> 
<A NAME=1.1.5>He.</A><br> 
</blockquote>  <!-- for more, see the source of shakespeare.mit.edu/hamlet/full.html -->

translating that into something like this:

play = Play()
actI = Act()
sceneI = Scene("Scene I", "Elsinore. A platform before the castle.")
sceneI.add_elem(StageDirection("Francisco at his post. Enter to him Bernardo."))
sceneI.add_elem(Line("Bernardo", "Who's there?"))
...

Of course, I don't expect all the code—but what libraries and, when there aren't libraries, logic should I use?

Thanks.

(This is for a future opensource project and me learning Python for fun—not homework.)


Use lxml or a similar parser. They will read your HTML (XML?) into a document tree, which is basically a more generic version of the data structure which you have written.

You can then iterate over the tree generated and prune it or rebuild another tree in memory that looks the way you want to. But the HTML -> data structure step is a solved problem.


Wait, do you want to generate the actual Python code? Why on earth would you want that?


BTW, your code won't do what you want:

class Play():
  acts = []
  ...
  def add_act(self, act): acts.append(act)

Try this instead:

class Play():
  def __init__(self):
    self.acts = []
  ...
  def add_act(self, act): 
    self.acts.append(act)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜