Density Value for each Return
I have a dataframe "foo" looking like this
Date Return
1998-01-01 0.02
1998-01-02 0.04
1998-01-03 -0.02
1998-01-04 -0.01
1998-01-05 0.02
...
1998-02-01 0.1
1998-02-02 -0.2
1998-02-03 -0.1
etc.
I would like to add to this dataframe a new column showing me the density value of the corresponding return. I tried:
foo$density <- for(i in 1:length(foo$Return)) density(foo$Return,
from = foo$Return[i], to = foo$Return[i], n = 1)$y
But it didn't work. I really have difficulty applying a "function" to each row. But maybe there is also another way to do it, not using density()?
What I essentially would like to do is to extract the fitted density values from density() to the returns in foo. If I just do plot(density(foo$Return)) it gives me the curve, however I would like to have the density values attached to the returns.
@Joris:
foo$densi开发者_JAVA技巧ty <- density(foo$Return, n=nrow(foo$Return))$y
calculates something, however seems to return wrong density values.
Thank you for helping me out! Dani
On second thought, forget about the density function, I suddenly realized what you wanted to do. Most density functions return a grid, so don't give you the evaluation in the exact points. If you want that, you can eg use the sm
package:
require(sm)
foo <- data.frame(Return=rpois(100,5))
foo$density <- sm.density(foo$Return,eval.points=foo$Return)$estimate
# the plot
id <- order(foo$Return)
hist(foo$Return,freq=F)
lines(foo$Return[id],foo$density[id],col="red")
If the number of different values is not that big, you can use ave() :
foo$counts <- ave(foo$Return,foo$Return,FUN=length)
If the purpose is to plot a density function, there's no need to calculate it like you did. Just use
plot(density(foo$Return))
Or, to add a histogram underneath (mind the option freq=F
)
hist(foo$Return,freq=F)
lines(density(foo$Return),col="red")
An alternative to sm.density
is to evaluate the density on a finer grid than default, and use approx
or approxfun
to give the interpolated values of the density at the Returns
you want. Here is an example with dummy data:
set.seed(1)
foo <- data.frame(Date = seq(as.Date("2010-01-01"), as.Date("2010-12-31"),
by = "days"),
Returns = rnorm(365))
head(foo)
## compute the density, on fin grid (512*8 points)
dens <- with(foo, density(Returns, n = 512 * 8))
At this point, we could use approx()
to interpolate the x
and y
components of the returned density, but I prefer approxfun()
which does the same thing, but returns a function which we can then use to do the interpolation. First, generate the interpolation function:
## x and y are components of dens, see str(dens)
BAR <- with(dens, approxfun(x = x, y = y))
Now you can use BAR()
to return the interpolated density at any point you wish, e.g. for the first Returns
:
> with(foo, BAR(Returns[1]))
[1] 0.3268715
To finish the example, add the density for each datum in Returns
:
> foo <- within(foo, Density <- BAR(Returns))
> head(foo)
Date Returns Density
1 2010-01-01 -0.6264538 0.3268715
2 2010-01-02 0.1836433 0.3707068
3 2010-01-03 -0.8356286 0.2437966
4 2010-01-04 1.5952808 0.1228251
5 2010-01-05 0.3295078 0.3585224
6 2010-01-06 -0.8204684 0.2490127
To see how well the interpolation is doing, we can plot the density and the interpolated version and compare. Note we have to sort Returns
because to achieve the effect we want, lines
needs to see the data in increasing order:
plot(dens)
with(foo, lines(sort(Returns), BAR(sort(Returns)), col = "red"))
Which gives something like this:
As long as the density is evaluated at sufficiently fine a set of points (512*8 in the above example) you shouldn't have any problems and will be hard pushed to tell the difference between the interpolated version and the real thing. If you have "gaps" in the values of your Returns
then you might find that, as lines()
just joins the points you ask it to plot, that straight line segments might not follow the black density at the locations of the gaps. This is just an artefact of the gaps and how lines()
works, not a problem with the interpolation.
If we ignore the density
issue, which @Joris expertly answers, you don't seem to have grasped how to set up a loop. What you are returning from the loop is the value NULL
. This is the value that is being inserted in foo$density
and that won't not work because it is the NULL
, which means it is an empty component, i.e. it doesn't exists as far as R is concerned. See ?'for'
for further details.
> bar <- for(i in 1:10) {
+ i + 1
+ }
> bar
NULL
> foo <- data.frame(A = 1:10, B = LETTERS[1:10])
> foo$density <- for(i in seq_len(nrow(foo))) {
+ i + 1
+ }
> head(foo) ## No `density`
A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F
If you want to insert the return value for each iteration of the loop, you must do the assignment inside the loop, and that means you should pre-allocate the storage space before you enter the loop, e.g. the above loop if we wanted to have i + 1
for i
in 1,...,10, we could do this:
> bar <- numeric(length = 10)
> for(i in seq_along(bar)) {
+ bar[i] <- i + 1
+ }
> bar
[1] 2 3 4 5 6 7 8 9 10 11
Of course, you would not do such a calculation as this via a loop, because R is vectorized and will work with vectors of numbers rather than you having to code each computation element by element as you might in C or other programming languages.
> bar <- 1:10 + 1
> bar
[1] 2 3 4 5 6 7 8 9 10 11
Notice that R has turned 1
into a vector of 1
s of sufficient length to allow the computation to proceed, something known as recycling in R-speak.
Sometimes, you might need to iterate over an object with a loop or using one of the s|l|t|apply()
family, but most often you will find a function that works for an entire vector of data in one go. This is one of the advantages of R over other programming languages, but does require you to get your head into vectorized mode.
Use this to obtain density values.
foo$density <- density(foo$Return, n=length(foo$Return))$y
精彩评论