Undesired python feedparser instantiation relic
Question: How do I kill an instantiation or insure i'm creating a new instantiation of the python universal feedparser?
Info:
I'm working on a program right now that downloads and catalogs large numbers of blogs. It has worked well so for except for an unfortunate bug. My code is set up to take a list of blog urls and run them through a for loop. each run it picks a url and sends it down to a separate class which manages the downloading, extracting, and saving of the data to a file.
The first url works just fine. It downloads the entirety of the blog and saves it to a file. But the second blog it downloads will have all the data from the first one as well, I'm totally clueless as to why.
Code snippets:
class BlogHarvester:
def __init__(self,folder):
f = open(folder,'r')
stop = folder[len(folder)-1]
while stop != '/':
folder = folder[0:len(folder)-1]
stop = f开发者_如何学Goolder[len(folder)-1]
blogs = []
for line in f:
blogs.append(line)
for herf in blogs:
blog = BlogParser(herf)
sPath = ""
uid = newguid()##returns random hash.
sPath = uid
sPath = sPath + " - " + blog.posts[0].author[1:5] + ".blog"
print sPath
blog.storeAsFile(sPath)
class BlogParser:
def __init__(self, blogherf='null', path='null', posts = []):
self.blogherf = blogherf
self.blog = feedparser.parse(blogherf)
self.path = path
self.posts = posts
if blogherf != 'null':
self.makeList()
elif path != 'null':
self.loadFromFile()
class BlogPeices:
def __init__(self,title,author,post,date,publisher,rights,comments):
self.author = author
self.title = title
self.post = post
self.date = date
self.publisher = publisher
self.rights = rights
self.comments = comments
I included snippets I figured that would probably be useful. Sorry if there are any confusing artifacts. This program has been a pain in the butt.
The problem is posts=[]
. Default arguments are calculated at compile time, not runtime, so mutations to the object remain for the lifetime of the class. Instead use posts=None
and test:
if posts is None:
self.posts = []
As what Ignacio said, any mutations that happen to the default arguments in the function list will stay for the life of the class.
From http://docs.python.org/reference/compound_stmts.html#function-definitions
Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function.
But this brings up sort of a gotcha, you are modifying a reference... So you may be modifying a list that the consumer of the class that wasn't expected to be modified:
For example:
class A:
def foo(self, x = [] ):
x.append(1)
self.x = x
a = A()
a.foo()
print a.x
# prints: [1]
a.foo()
print a.x
# prints: [1,1] # !!!! Consumer would expect this to be [1]
y = [1,2,3]
a.foo(y)
print a.x
# prints: [1, 2, 3, 1]
print y
# prints: [1, 2, 3, 1] # !!!! My list was modified
If you were to copy it instead: (See http://docs.python.org/library/copy.html )
import copy
class A:
def foo(self, x = [] ):
x = copy.copy(x)
x.append(1)
self.x = x
a = A()
a.foo()
print a.x
# prints: [1]
a.foo()
print a.x
# prints: [1] # !!! Much better =)
y = [1,2,3]
a.foo(y)
print a.x
# prints: [1, 2, 3, 1]
print y
# prints: [1, 2, 3] # !!!! My list is how I made it
精彩评论