Archive: March 2008

How to compute SVD and conquer the world before breakfast?

17 March, 2008 11:00

Digg, Face Book and My Space all rolled into one.

Imagine you a serial entrepreneur and would like to conquer the world with a new, kicking ass web application. Digg, Face Book and My Space all rolled into one. You are going to a build perfect tool that will discover a relevant content for you and your friends all over the whooooooole web. It will be based on your intentions and intentions of your community. Your idea is great!!!. You can already see yourself on the cover of Wired magazine. You will be more popular than David Heinemaier Hanson, Britney Spears and David Beckham all together. You will be a super star. You ARE a super star!!!

You open you machine, start coding, implement the first pages of your application and then… Boom, bang, boom! The first problem crops up. How to group web sites, blog posts based on their content? How to retrieve semantics from them? How to cluster all this stuff? You google for the solution and after a while you find out that sth called Singular Value Decomposition may help. Old uncle Google says it definitely sounds like a good “candidate” to investigate. Unfortunately, you have to refresh your knowledge about linear algebra. It is awful but if you want to reach for the stars… No pain no gain!

Linear algebra comes with help - SVD

Today I would like to continue with a linear algebra refresher and introduce matrix decomposition with SVD.

Singular Value Decomposition is one of the cluster analysis and noise reduction technique introduced in 1965 by G. Golub and W. Raman. When we compute the SVD of a matrix the goal is to reduce matrix dimensionality by its decomposition into three components and keeping first k singular values. It will give us an approximated representation of the original matrix. A kind of compressed version of the matrix.

There are a lot of practical areas where SVD can be applied. In one of the next posts I would like to present the application of SVD in Information Retrieval where it is used in the process of Latent Semantic Indexing in order to search for documents that meet given query. It is one of the methods that can help in the analysis of text collections. For instance, blog post or other web content exposed to your next generation web application.

Before I share with you code to do all the stuff please grab your pencil plus a notebook and let me share with you in the next few lines how is singular values decomposition computed and approximation of the matrix built.

SVD decomposes matrix into three components: A = USVT

  • U - matrix whose columns are the eigenvectors of the AAT matrix - called the left eigenvectors
  • S - matrix whose diagonal elements are the singular values of A. Please note it is diagonal matrix so its all non-diagonal elements are zero
  • V - matrix whose column are the eigenvectors of the ATA matrix - called the right eigenvectors

Okey-doke… nice definition but how does it work? Let’s say we have matrix A

A-matrix.png

The first step is to compute a AT. If you would like to refresh your memory about matrices and matters related to them please have a look at Matrix cookbook Part 1 and 2. All you have to know is how to transpose matrix and multiply two matrices.

As soon as we have AT and ATA the next step is to obtain singular values of the matrix. In order to do it we have to compute the eigen values of ATA, sort absolute values of them in descending order and just after that place them on the diagonal of S matrix. Don’t worry, it is easy. Actually, all you need to do is to solve a polynomial equation. Let’s have a look at all the calculations.

calculation-1.png

Singular values of the matrix are computed. We can construct S matrix.

s.png

Fresh coffe on the table and now we are ready to compute V

We will use eigen values of the ATA matrix we have just computed and construct eigen vectors.

Let’s do it for x = 13.32

V1.png

Let’s repeat the exercise but this time for x = 0.67

V2.png

There we go… We are almost done. Now we have to place eigenvectors on V matrix and transpose it.

V.png

At this point we have to the same calculation but for AAT matrix and compute U matrix.

U1.png

…time for U2

U2.png

Here we go. Matrix reloaded

U.png

We have computed U, S and VT. Now we have to write unit test… hmm… Please grab your pencil again and check with me if USVT is the approximation of matrix A.

USV.png

It looks good. We have decomposed matrix A into three matrices. But… why do it? How can we benefit from it? We will benefit from it as soon as we built compressed version of the matrix A. In our simple example the matrix is 2-dimensional. Imagine you deal with a matrix with huge dimensionality. Imagine “gazillions” of dimensions and a lot of noise there. It is a situation where you definitely would prefer to deal with approximated version. Fortunately, building an approximation of the original matrix is piece of cake. It is done by truncating the three matrices obtained from a full SVD. All we keep are the first k columns of U, the first k rows of VT and the first k rows and columns of S. As soon as we have done it, we get a compressed version of the matrix A.

There is more to come…

In the next post I will share with you how to apply SVD in the process of Latent Semantic Indexing. You won’t even need a pencil or a notebook. This time we will write some code. Let’s apply theory in practice.

I also plan to continue an idea of building the next generation web content filter and focus on data mining techniques that can be useful on the way.

Stay tuned. There is more to come.

Bootstrap 8.3

11 March, 2008 07:42

Almost one year ago the group of Ruby on Rails enthusiast met in a strange place somewhere in the center of Warsaw. People had a different background, experience and one thing in common-the mindset. All shared the same ethusiasm for web development. That is how Bootstrap was born!

Bootstrap is a gathering of passionate hackers, software engineers who want to interact and discuss ideas, present new technologies and share their experience. It is also a meeting point for entrepreneurs and like-minded people interested in technology and start-ups.

Most of the meetings consist of a technical presentation and as well as a business one. The last meeting was an exception to the rule because the whole meetup was dedicated to “social lending”. We invited all major players on Polish market Sebastian Wojdyła-kokos.pl, Rafał Agnieszczak-finansowo.pl and Łukasz Banach-monetto.pl.

The next meetup will take place on Saturday (15 March 2008). There are two presentation on the agenda.

  • Marcin Kaszyński will present Greasemonkey
  • Maciej Jarzębowski and Mariusz Ciepły will tell the story of LiveChat Software SA before it was taken over by Gadu-Gadu S.A

After those two presentations there will be time for everybody to promote and test projects. It is new on Bootstrap. We call it Bootstrap Elevator Speech. Each of the participants will get 5 minutes to present his or her ideas for a new start-up. It is a great challenge for entrepreneurs. Imagine how hard it is to make an impression and deliver a convincing presentation in 5 minutes. Personally I think it is a great test of self-confidence. Good luck guys!

Come and enjoy an informal, hacking atmosphere, discuss your latest ideas and learn about trends in the web market. There’s no entry fee and all web enthusiast are welcome.

Posted in: Life | No Comments » tags: