开发者

How can I organize each scraped item into a csv row?

What is the best way to organize scraped data into a csv? More specifically each item is in this form

url
"firstName middleInitial, lastName - level - word1 word2 word3, & wordN practice officeCity."
JD, schoolName, date

Example:

http://www.examplefirm.com/jang
"Joe E. Ang - partner - privatization mergers, media & technology practice New York."
JD, University of Chicago Law School, 1985

I want to put this item in this form:

(http://www.examplefirm.com/jang, Joe, E., Ang, partner, privatization mergers, media & technology, New York, University of Chicago Law Scho开发者_如何转开发ol, 1985)

so that I can write it into a csv file to import to a django db.

What would be the best way of doing this?

Thank you.


There's really no short cut on this. Line 1 is easy. Just assign it to url. Line 3 can probably be split on , without any ill effects, but line 2 will have to be manually parsed. What do you know about word1-wordN? Are you sure "practice" will never be a "word". Are you sure the words are only one word long? Can they be quoted? Can they contain dashes?

Then I would parse out the beginning and end bits, so you're left with a list of words, split it by commas and/or & (is there a consistent comma before &? Your format says yes, but your example says no.) If there are a variable number of words, you don't want to inline them in your tuple like that, because you don't know how to get them out. Create a list from your words, and add that as one element of the tuple.

>>> tup = (url, first, middle, last, rank, words, city, school, year)
>>> tup
('http://www.examplefirm.com/jang', 'Joe', 'E.', 'Ang', 'partner', 
['privatization mergers', 'media & technology'], 'New York', 
'University of Chicago Law School', '1985')

More specifically? You're on your own there.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜