开发者

How Do I Extract a List of Sheets from a Google Spreadsheets API XML Feed in R?

I'm trying to grab a list of sheets in a public Google Spreadsheet from a spreadsheets API XML feed using R, and just don't have a feel for how to do it...

More by luck (and cut and paste!) than judgement, I've got as far as importing the feed, and (using a crib from开发者_StackOverflow http://www.mail-archive.com/r-help@r-project.org/msg108087.html regarding default namespaces) got some sort of list of spreadsheet items item:

ssKey='0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc'
ssURL=paste(sep="",'http://spreadsheets.google.com/feeds/worksheets/',ssKey,'/public/basic')
ssd=xmlTreeParse(ssURL,useInternal=TRUE)
nodes=getNodeSet(ssd,"//x:entry","x")
titles=sapply(nodes, function(x) xmlSApply(x, xmlValue))

Here's what I get as a result:

[,1]

id "https://spreadsheets.google.com/feeds/worksheets/0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc/public/basic/od6" updated "2011-08-28T16:03:50.190Z"

category ""

title "FP1 times"

content "FP1 times"

link ""

link ""

link ""

link ""

[,2]

id "https://spreadsheets.google.com/feeds/worksheets/0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc/public/basic/od7" updated "2011-08-28T16:03:50.190Z"

category ""

title "FP1 Classification"

content "FP1 Classification"

link ""

link ""

link ""

link ""

[,3]

id

The question is, how do I create a data frame, or print out a tidy list, that displays sheet titles and IDs, something like:

sheetName           sheet.Id
FP1 times            od6
FP1 Classification   od7

where od6 and od7 are identifiers that appear at the end of the id element.


Running your code, it seems all the relevant information is there, stored in the 9 x 13 character matrix called titles. It looks like you want the information from particular rows of this matrix. We can extract it as follows:

library(stringr)
data.frame(sheetName = titles['content',], 
           sheetId = str_sub(titles['id',],-3,-1))

            sheetName sheetId
1           FP1 times     od6
2  FP1 Classification     od7
3           FP2 times     od4
4  FP2 Classification     od5
5           FP3 Times     oda
6  FP3 Classification     odb
7   Practice Combined     od8
8         Quali Times     od9
9         Quali Stats     ocy
10         Race times     ocz
11       Race History     ocw
12     Race Proximity     ocx
13         Race Stats     od2

Note that I'm being simplistic here in assuming that the sheet id will always be the last three characters of the url. In general that may not be the case, in which case you'll need a more sophisticated strategy for extracting the relevant piece of the url, perhaps involving regexp.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜