Random projection algorithm pseudo code
I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
- Assume we have a
AxB
matrix whereA
is number of samples andB
is the number of dimensions, e.g.100x5000
. And I want to reduce the dimension of it to500
, which will produce a100x500
matrix.
As far as I understand: first, I need to construct a 100x500
matrix and fill the entries randomly with +1
and -1
(with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrixA
which is mxn
. We want to reduce it to E
which is mxk
.
What we need to do is, to construct a matrix R
which has nxk
dimension, and fill it with 0
, -1
or +1
, with respect to 2/3
, 1/6
and 1/6
probability.
After constructing this R
, we'll simply do a matrix multiplication AxR
to find our reduced matrix E
. But we don't need to do a full matrix multiplication, because if an element of Ri
is 0
, we don't need to do calculation. Simply skip i开发者_如何学JAVAt. But if we face with 1
, we just add the column, or if it's -1
, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E
. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
If your dataset is sparse, then sparse random projections will not work well. You have a few options here:
Option A:
Step 1. apply a structured dense random projection (so called fast hadamard transform is typically used). This is a special projection which is very fast to compute but otherwise has the properties of a normal dense random projection
Step 2. apply sparse projection on the "densified data" (sparse random projections are useful for dense data only)
Option B: Apply SVD on the sparse data. If the data is sparse but has some structure SVD is better. Random projection preserves the distances between all points. SVD preserves better the distances between dense regions - in practice this is more meaningful. Also people use random projections to compute the SVD on huge datasets. Random Projections gives you efficiency, but not necessarily the best quality of embedding in a low dimension. If your data has no structure, then use random projections.
Option C:
For data points for which SVD has little error, use SVD; for the rest of the points use Random Projection
Option D: Use a random projection based on the data points themselves. This is very easy to understand what is going on. It looks something like this:
create a n by k matrix (n number of data point, k new dimension)
for i from 0 to k do #generate k random projection vectors
randomized_combination = feature vector of zeros (number of zeros = number of features)
sample_point_ids = select a sample of point ids
for each point_id in sample_point_ids do:
random_sign = +1/-1 with prob. 1/2
randomized_combination += random_sign*feature_vector[point_id] #this is a vector operation
normalize the randomized combination
#note that the normal random projection is:
# randomized_combination = [+/-1, +/-1, ...] (k +/-1; if you want sparse randomly set a fraction to 0; also good to normalize by length]
to project the data points on this random feature just do
for each data point_id in dataset:
scores[point_id, j] = dot_product(feature_vector[point_id], randomized_feature)
If you are still looking to solve this problem, write a message here, I can give you more pseudocode.
The way to think about it is that a random projection is just a random pattern and the dot product (i.e. projecting the data point) between the data point and the pattern gives you the overlap between them. So if two data points overlap with many random patterns, those points are similar. Therefore, random projections preserve similarity while using less space, but they also add random fluctuations in the pairwise similarities. What JLT tells you is that to make fluctuations 0.1 (eps) you need about 100*log(n) dimensions.
Good Luck!
An R Package to perform Random Projection using Johnson- Lindenstrauss Lemma RandPro
精彩评论