开发者

Time Stamps, Qplot and strptime in R

This is a follow up question as hadley pointed out unless I fix the problem with the time stamps the graphs I produce would be incorrect. With this in mind I am working towards fixing the issues I am having with the code. So far I have from my earlier questions that have been answered stopped using the attach() function in favour of using dataSet.df$variableName I am having problems drawing the graph from the strptime time stamps. I will attach all the code I am using and the XML file from which the data set is parsed (This was also answered in an earlier question) from.

<?xml version = "1.0"?>
    <Company >
 <shareprice>
     <timeStamp> 12:00:00.01</timeStamp>
     <Price>  25.02</Price>
 </shareprice>
 <shareprice>
     <timeStamp> 12:00:00.02</timeStamp>
     <Price>  15</Price>
 </shareprice>
 <shareprice>
      <timeStamp> 12:00:00.025</timeStamp>
      <Price>  15.02</Price>
 </shareprice>
 <shareprice>
      <timeStamp> 12:00:00.031</timeStamp>
      <Price>  18.25</Price>
 </shareprice>
 <shareprice>
      <timeStamp> 12:00:00.039</timeStamp>
      <Price>  18.54</Price>
 </shareprice>
 <shareprice>
       <timeStamp> 12:00:00.050</timeStamp>
       <Price> 16.52</Price>
 </shareprice>
    <shareprice>
      <timeStamp> 12:00:01.01</timeStamp>
      <Price>  17.50</Price>
    </shareprice>
  </Company>

The R code I have currently is as follows:

library(ggplot2)
library (XML)
test.df <- xmlToDataFrame("c:/Users/user/Desktop/shares.xml")
test.df 
timeStampParsed <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
test.df$Price <- as.numeric(as.character(test.df$Price))
summary (test.df)
mean(test.df$Price)
sd (test.df$Price)
mean(timeStampParsed)
par(mfrow=c(1,2))
p开发者_如何学Golot(timeStampParsed, test.df$Price)
qplot(timeStampParsed,Price,data=test.df,geom=c("point","line"), 
      scale_y_continuous(limits = c(10,26)))

The plot command produces a graph but it is not very pleasant looking. the qplot command returns the following error message:

Error in sprintf(gettext(fmt, domain = domain), ...) : 
invalid type of argument[1]: 'symbol'

In the interest in getting this right (and cutting down on the questions being asked) is there a tutorial / website that I can use? Once again thanks very much for your help.


You still make some of the mistakes in the code I corrected in my two previous answers to you. So let's try this again, more explicitly:

library(ggplot2)
library (XML)
df <- xmlToDataFrame("/tmp/anthony.xml")   # assign to df, shorter to type
df
sapply(df, class)          # shows everything is a factor
summary(df)                # summary for factor: counts !
df$timeStamp <- strptime(as.character(test.df$timeStamp), "%H:%M:%OS")
df$Price <- as.numeric(as.character(test.df$Price))
sapply(df, class)          # shows both columns converted
options("digits.secs"=3)   # make sure we show sub-seconds
summary (df)               # real summary
with(df, plot(timeStamp, Price))    # with is an elegant alternative to attach()

I also get an error with qplot() but you may simply have too little of a range in your data. So let's try this:

R> set.seed(42)               # fix random number generator
R> df$timeStamp <- df[1,"timeStamp"] + cumsum(runif(7)*60)
R> summary(df)                # new timestamps spanning larger range
   timeStamp                          Price     
 Min.   :2010-07-14 12:00:54.90   Min.   :15.0  
 1st Qu.:2010-07-14 12:01:59.71   1st Qu.:15.8  
 Median :2010-07-14 12:02:58.12   Median :17.5  
 Mean   :2010-07-14 12:02:55.54   Mean   :18.0  
 3rd Qu.:2010-07-14 12:03:52.20   3rd Qu.:18.4  
 Max.   :2010-07-14 12:04:51.96   Max.   :25.0  
R> qplot(timeStamp,Price, data=df, geom=c("point","line"), 
+  scale_y_continuous(limits = c(10,26)))
R> 

Now qplot() works.

So in sum, you were using data that was not fulfilling some minimum requirements of the qplot function your were using -- having a time axis spanning more than a second, say.

In general, you may want to start with An Introduction to R (came with the program) or another intro text. You jumped head-first to advanced material (datetime data types, reading from XML, factors, ...) and got burned. First steps first.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜