How Do I Extract a List of Sheets from a Google Spreadsheets API XML Feed in R?
I'm trying to grab a list of sheets in a public Google Spreadsheet from a spreadsheets API XML feed using R, and just don't have a feel for how to do it...
More by luck (and cut and paste!) than judgement, I've got as far as importing the feed, and (using a crib from开发者_StackOverflow http://www.mail-archive.com/r-help@r-project.org/msg108087.html regarding default namespaces) got some sort of list of spreadsheet items item:
ssKey='0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc'
ssURL=paste(sep="",'http://spreadsheets.google.com/feeds/worksheets/',ssKey,'/public/basic')
ssd=xmlTreeParse(ssURL,useInternal=TRUE)
nodes=getNodeSet(ssd,"//x:entry","x")
titles=sapply(nodes, function(x) xmlSApply(x, xmlValue))
Here's what I get as a result:
[,1] id "https://spreadsheets.google.com/feeds/worksheets/0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc/public/basic/od6" updated "2011-08-28T16:03:50.190Z" category "" title "FP1 times" content "FP1 times" link "" link "" link "" link "" [,2] id "https://spreadsheets.google.com/feeds/worksheets/0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc/public/basic/od7" updated "2011-08-28T16:03:50.190Z" category "" title "FP1 Classification" content "FP1 Classification" link "" link "" link "" link "" [,3] id
The question is, how do I create a data frame, or print out a tidy list, that displays sheet titles and IDs, something like:
sheetName sheet.Id
FP1 times od6
FP1 Classification od7
where od6 and od7 are identifiers that appear at the end of the id element.
Running your code, it seems all the relevant information is there, stored in the 9 x 13 character matrix called titles
. It looks like you want the information from particular rows of this matrix. We can extract it as follows:
library(stringr)
data.frame(sheetName = titles['content',],
sheetId = str_sub(titles['id',],-3,-1))
sheetName sheetId
1 FP1 times od6
2 FP1 Classification od7
3 FP2 times od4
4 FP2 Classification od5
5 FP3 Times oda
6 FP3 Classification odb
7 Practice Combined od8
8 Quali Times od9
9 Quali Stats ocy
10 Race times ocz
11 Race History ocw
12 Race Proximity ocx
13 Race Stats od2
Note that I'm being simplistic here in assuming that the sheet id will always be the last three characters of the url. In general that may not be the case, in which case you'll need a more sophisticated strategy for extracting the relevant piece of the url, perhaps involving regexp.
精彩评论