CSV Writer writing over itself
I am trying to create a CSV file with a list of URLs.
I am pretty new to programming, so please excuse any sloppy code.
I have a loop that runs through a list of places to get the list of URLs.
I then have a loop within that loop that exports the data to a CSV file.
import urllib, csv, re
from BeautifulSoup import BeautifulSoup
list_of_URLs = csv.reader(open("file_location_for_URLs_to_parse"))
for row in list_of_URLs:
row_string = "".join(row)
file = urllib.urlopen(row_string)
page_HTML = file.read()
soup = BeautifulSoup(page_HTML) # parsing HTML
Thumbnail_image = soup.findAll("div", {"class": "remositorythumbnail"})
Thumbnail_image_string = str(Thumbnail_image)
soup_3 = BeautifulSoup(Thumbnail_image_string)
Thumbnail_image_URL = soup_3.findAll('a', attrs={'href': re.compile("^http://")})
This is the part that isn't working for me:
out = csv.writer(open("file_location", "wb"), delimiter=";")
for tag in soup_3.findAll('a', href=True):
out.writerow(tag['href'])
Basically the writer keeps on writing over itself, is there开发者_StackOverflow a way to jump to below the first empty row on the CSV and start writing?
Don't put this inside any loop:
out = csv.writer(open("file_location", "wb"), delimiter=";")
Instead:
with open("file_location", "wb") as fout:
out = csv.writer(fout, delimiter=";")
# put for-loop here
Notes:
open("file_location", "wb")
creates a new file, destroying any old file of the same name. This is why it looks like the writer is overwriting old lines.- Use
with open(...) as ...
because it automatically closes the file for you when thewith-block
ends. This makes explicit when the file is closed. Otherwise, the file remains open (and maybe not completely flushed) untilout
is deleted or reassigned to a new value. It's not really your main problem here, but usingwith
is too useful not to mention.
Are you closing the file after every write, or opening the file before every write? Just check that.
Also, try using "ab" mode instead of "wb". "ab" will append to the file.
The open("file_location", "wb")
call, which you are doing once for every URL, is wiping out what you did to that file previously. Move it outside your for
loop so that it is only opened once for all the URLs.
精彩评论