开发者

CSV Writer writing over itself

I am trying to create a CSV file with a list of URLs.

I am pretty new to programming, so please excuse any sloppy code.

I have a loop that runs through a list of places to get the list of URLs.

I then have a loop within that loop that exports the data to a CSV file.

import urllib, csv, re
from BeautifulSoup import BeautifulSoup
list_of_URLs = csv.reader(open("file_location_for_URLs_to_parse"))
for row in list_of_URLs:
    row_string = "".join(row)
    file = urllib.urlopen(row_string)
    page_HTML = file.read()
    soup = BeautifulSoup(page_HTML) # parsing HTML
    Thumbnail_image = soup.findAll("div", {"class": "remositorythumbnail"})
    Thumbnail_image_string = str(Thumbnail_image)
    soup_3 = BeautifulSoup(Thumbnail_image_string)
    Thumbnail_image_URL = soup_3.findAll('a', attrs={'href': re.compile("^http://")})

This is the part that isn't working for me:

    out  = csv.writer(open("file_location", "wb"), delimiter=";")
    for tag in soup_3.findAll('a', href=True):   
        out.writerow(tag['href'])

Basically the writer keeps on writing over itself, is there开发者_StackOverflow a way to jump to below the first empty row on the CSV and start writing?


Don't put this inside any loop:

out  = csv.writer(open("file_location", "wb"), delimiter=";")

Instead:

with open("file_location", "wb") as fout:
    out = csv.writer(fout, delimiter=";")
    # put for-loop here

Notes:

  1. open("file_location", "wb") creates a new file, destroying any old file of the same name. This is why it looks like the writer is overwriting old lines.
  2. Use with open(...) as ... because it automatically closes the file for you when the with-block ends. This makes explicit when the file is closed. Otherwise, the file remains open (and maybe not completely flushed) until out is deleted or reassigned to a new value. It's not really your main problem here, but using with is too useful not to mention.


Are you closing the file after every write, or opening the file before every write? Just check that.
Also, try using "ab" mode instead of "wb". "ab" will append to the file.


The open("file_location", "wb") call, which you are doing once for every URL, is wiping out what you did to that file previously. Move it outside your for loop so that it is only opened once for all the URLs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜