Specific doubts on kgp.py program in dive into python book

2023-03-25 09:58 问答作者：

Dive into Python: XML Processing -

Here I am re开发者_StackOverflow社区ferring to a portion of kgp.py program -

def getDefaultSource(self):
  xrefs = {}
  for xref in self.grammar.getElementsByTagName("xref"):
    xrefs[xref.attributes["id"].value] = 1
  xrefs = xrefs.keys()
  standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
  if not standaloneXrefs:
    raise NoSourceError, "can't guess source, and no source specified"
  return '<xref id="%s"/>' % random.choice(standaloneXrefs)

self.grammar: parsed XML representation (using xml.dom.minidom) of -

<?xml version="1.0" ?>
<grammar>
<ref id="bit">
  <p>0</p>
  <p>1</p>
</ref>
<ref id="byte">
  <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\
<xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>
</ref>
</grammar>

self.refs: is the caching of all the refs of the above XML key'd by their id

I have two doubts with this code:

Doubt 1:

  for xref in self.grammar.getElementsByTagName("xref"):
    xrefs[xref.attributes["id"].value] = 1
  xrefs = xrefs.keys()

eventaully xrefs holds the id values in a list. Couldn't we have done this simply by -

  xrefs = [xref.attributes["id"].value 
           for xref in self.grammar.getElementsByTagName("xref")]

Doubt 2:

  standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
  ...
  return '<xref id="%s"/>' % random.choice(standaloneXrefs)

Here, we are saving the ref from self.refs which we do NOT see in our computed xrefs. But next instead of creating a <ref> element, we are creating a <xref> with the same ID. This takes us one step backward, since later we are anyway going to find the cross reference for this computed <xref> and eventually reach the <ref>. We could have just started with this <ref> in the first place.

Disclaimer

I am in no way trying to make a remark on the book. I am not even qualified for that.

I am loving every moment of reading this book. I realize few chapters have gone outdated, but I love Mark Pilgrim's writing style and I cannot stop reading.

Dive Into Python is seven years old now (published 2004), and doesn't always contain the most modern code. So you need to go easy on it: Dive Into Python 3 might be a better bet.

Your suggestion for doubt 1 changes the meaning of the code: putting the ids into the keys of a dictionary and then getting them out again eliminates duplicates, whereas your list comprehension includes duplicates. The modern approach would be to use a set comprehension:

 xrefs = {xref.attributes["id"].value 
          for xref in self.grammar.getElementsByTagName("xref")}

but this wasn't available in 2004.

On your doubt 2, I'm not entirely sure I see the problem. Yes, in some sense this is a waste, but on the other hand the code already has a handler for the xref case, so it makes sense to re-use that handler rather than add an extra special case.

There are several other bits of code in that example that could be modernized. For example,

source and source or self.getDefaultSource()

would now be source or self.getDefaultSource(). And the line

standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]

would be better expressed as a set difference operation, something like:

standaloneXrefs = set(self.refs) - set(xrefs)

But that's what happens as languages become more expressive: old code starts to look rather inelegant.

Your doubts are totally justified: that code doesn't look very good to me at all. For example, it uses 1 as a boolean value where True would have sufficed and been clearer.

Doubt 1:

These two snippets don't do the same. If there are duplicates, the original code will filter them out, but your alternative won't. On the other hand, your code preserves the original ordering whereas the original returns the elements in an arbitrary order.

To be fully equivalent, we could use the set builtin:

xrefs = list(set([xref.attributes["id"].value for xref in self.grammar.getElementsByTagName("xref")]))

(It might not make sense to convert back to a list, though.)

Doubt 2:

Out of time, gotta run, sorry...

for xref in self.grammar.getElementsByTagName("xref"):
  xrefs[xref.attributes["id"].value] = 1
xrefs = xrefs.keys()

This is an extremely crude way to construct a set. This should be written as

set(xref.attributes["id"].value
    for xref in self.grammar.getElementsByTagName("xref"))

or even (in Python 2.7+):

{xref.attributes["id"].value
 for xref in self.grammar.getElementsByTagName("xref")) }

If avoiding duplicates is not an issue, your solution (constructing a list) works too. Since xref is iterated over anyway, one could even generate an iterator.

standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
...
return '<xref id="%s"/>' % random.choice(standaloneXrefs)

This code is completely broken if xref contains a special character such as " or &. However, in principle, it is correct to construct an <xref> element here, since this must be the same format that the external source has (getDefaultSource is called as

self.loadSource(source and source or self.getDefaultSource())

Both code excerpts are examples of bad programming and should not be included in a book that intends to teach people how to program. Dive Into Python3 has better XML examples and code.

继续阅读：python xml

Specific doubts on kgp.py program in dive into python book

Disclaimer

Doubt 1:

Doubt 2:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Disclaimer

Doubt 1:

Doubt 2:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？