Specific doubts on kgp.py program in dive into python book
Dive into Python: XML Processing -
Here I am re开发者_StackOverflow社区ferring to a portion of kgp.py
program -
def getDefaultSource(self):
xrefs = {}
for xref in self.grammar.getElementsByTagName("xref"):
xrefs[xref.attributes["id"].value] = 1
xrefs = xrefs.keys()
standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
if not standaloneXrefs:
raise NoSourceError, "can't guess source, and no source specified"
return '<xref id="%s"/>' % random.choice(standaloneXrefs)
self.grammar:
parsed XML
representation (using xml.dom.minidom
) of -
<?xml version="1.0" ?>
<grammar>
<ref id="bit">
<p>0</p>
<p>1</p>
</ref>
<ref id="byte">
<p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\
<xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>
</ref>
</grammar>
self.refs:
is the caching of all the refs
of the above XML
key'd by their id
I have two doubts with this code:
Doubt 1:
for xref in self.grammar.getElementsByTagName("xref"):
xrefs[xref.attributes["id"].value] = 1
xrefs = xrefs.keys()
eventaully xrefs
holds the id values in a list. Couldn't we have done this simply by -
xrefs = [xref.attributes["id"].value
for xref in self.grammar.getElementsByTagName("xref")]
Doubt 2:
standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
...
return '<xref id="%s"/>' % random.choice(standaloneXrefs)
Here, we are saving the ref
from self.refs
which we do NOT see in our computed xrefs
. But next instead of creating a <ref>
element, we are creating a <xref>
with the same ID. This takes us one step backward, since later we are anyway going to find the cross reference for this computed <xref>
and eventually reach the <ref>
. We could have just started with this <ref>
in the first place.
Disclaimer
I am in no way trying to make a remark on the book. I am not even qualified for that.
I am loving every moment of reading this book. I realize few chapters have gone outdated, but I love Mark Pilgrim's writing style and I cannot stop reading.
Dive Into Python is seven years old now (published 2004), and doesn't always contain the most modern code. So you need to go easy on it: Dive Into Python 3 might be a better bet.
Your suggestion for doubt 1 changes the meaning of the code: putting the ids into the keys of a dictionary and then getting them out again eliminates duplicates, whereas your list comprehension includes duplicates. The modern approach would be to use a set comprehension:
xrefs = {xref.attributes["id"].value
for xref in self.grammar.getElementsByTagName("xref")}
but this wasn't available in 2004.
On your doubt 2, I'm not entirely sure I see the problem. Yes, in some sense this is a waste, but on the other hand the code already has a handler for the xref
case, so it makes sense to re-use that handler rather than add an extra special case.
There are several other bits of code in that example that could be modernized. For example,
source and source or self.getDefaultSource()
would now be source or self.getDefaultSource()
. And the line
standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
would be better expressed as a set difference operation, something like:
standaloneXrefs = set(self.refs) - set(xrefs)
But that's what happens as languages become more expressive: old code starts to look rather inelegant.
Your doubts are totally justified: that code doesn't look very good to me at all. For example, it uses 1
as a boolean value where True
would have sufficed and been clearer.
Doubt 1:
These two snippets don't do the same. If there are duplicates, the original code will filter them out, but your alternative won't. On the other hand, your code preserves the original ordering whereas the original returns the elements in an arbitrary order.
To be fully equivalent, we could use the set
builtin:
xrefs = list(set([xref.attributes["id"].value for xref in self.grammar.getElementsByTagName("xref")]))
(It might not make sense to convert back to a list, though.)
Doubt 2:
Out of time, gotta run, sorry...
for xref in self.grammar.getElementsByTagName("xref"):
xrefs[xref.attributes["id"].value] = 1
xrefs = xrefs.keys()
This is an extremely crude way to construct a set. This should be written as
set(xref.attributes["id"].value
for xref in self.grammar.getElementsByTagName("xref"))
or even (in Python 2.7+):
{xref.attributes["id"].value
for xref in self.grammar.getElementsByTagName("xref")) }
If avoiding duplicates is not an issue, your solution (constructing a list) works too. Since xref is iterated over anyway, one could even generate an iterator.
standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
...
return '<xref id="%s"/>' % random.choice(standaloneXrefs)
This code is completely broken if xref contains a special character such as "
or &
.
However, in principle, it is correct to construct an <xref>
element here, since this must be the same format that the external source
has (getDefaultSource
is called as
self.loadSource(source and source or self.getDefaultSource())
).
Both code excerpts are examples of bad programming and should not be included in a book that intends to teach people how to program. Dive Into Python3 has better XML examples and code.
精彩评论