Risk Latte - Cleaning a Correlation Matrix of Asset Returns - Spreadsheet Example

Cleaning a Correlation Matrix of Asset Returns - Spreadsheet Example

Team latte
February 24, 2007

( If you are familiar with the theory of eigenvalues and eigenvectors and how to use them to clean a correlation matrix then skip the theory and scroll down below to see the numerical example .)

Many a times correlation matrices generated from historical data are nonsensical. What does this mean? It means that the correlations between a pair of asset returns is unstable, or is not correct. Sometimes you may hear risk managers talking about "spurious correlations". It is quite likely that the correlations between two return series may have been drawn from different time periods (don't be surprised!!) so even if the length of the series is same the exact trading dates could be different. Or some correlations may have been invented, meaning certain return assumptions have been made to fill in a certain empty block of dates (where no prices exists).

A nonsensical correlation matrix - or an "unstable" or an "invalid" correlation matrix - cannot be used in any risk management calculations or multi-asset pricing. Why? Because the cholesky of the matrix does not exist (if you are using an Excel sheet to get the output of a cholesky matrix from the original matrix you'll see #Value in the output matrix cells). This means the correlation matrix is not positive semi-definite and it contains negative eigenvalues. A nonsensical correlation matrix needs to be cleaned before it can be used for either derivatives pricing or risk analysis.

How do we know that a correlation matrix is nonsensical or invalid?

If the cholesky matrix does not exist for any correlation matrix then it is not a valid correlation matrix. A nonsensical or invalid correlation matrix will have at least one negative eigenvalue. We say that the correlation matrix is not positive semi-definite. All workable, valid correlation matrices should be positive semi-definite. A positive semi-definite matrix has all its eigenvalues as positive. A n x m correlation matrix has 1 x m vector of eigenvalues. If any one of them is negative then the correlation matrix is invalid. Alternatively, the cholesky of the correlation will not exist. Moreover, a quick and dirty way to see if the correlation matrix is going to be valid or not is to check the determinant of the matrix. If the determinant is negative then the cholesky will not exist and the correlation matrix will not be positive semi-definite. If the determinant of a correlation matrix is negative then you are in trouble. Then we need to clean a correlation matrix to make it a valid one.

How do we make a nonsensical correlation matrix valid?

How do we clean an invalid correlation matrix? Essentially, we need to get a correlation matrix that is positive semi-definite and if a certain correlation matrix is not positive semi-definite then we need make it positive semi-definite and we use Principal Components Analysis (PCA) to do that . Here is a brief and approximate (but workable) algorithm to clean a correlation matrix and make it positive semi definite.

  1. After calculating the eigenvalues of the correlation matrix, you'll find at least one of the eigenvalues as negative;

  2. Set the negative eigenvalue(s) as zero; This gives a new row vector of eigenvalues;

  3. Arrange the new vector of eigenvalues as a diagonal matrix; This is the lambda matrix;

  4. Then perform the following matrix operation using the lambda matrix and the existing eigenvectors matrix:

Remember that one can retrieve the any correlation matrix M by using the following matrix transformation:

Where, W is the n x m matrix of eigenvectors of the correlation matrix and is the diagonal matrix of eigenvalues of the correlation matrix.

Example :

(This is a Spreadsheet example).

Say we had a correlation matrix M of four asset returns (a real life example) as follows:

This correlation matrix is not a valid correlation matrix. If you calculate the determinant of this matrix (we can do that very easily in Excel spreadsheet) it will come out to be -0.02652. That is . If we were to estimate the cholesky of this matrix (which can also be done very easily in Excel spreadsheet, we will get #Values for all the cells of the decomposed matrix. A quant will save himself a lot of time if he performs this operation first to check if the correlations are valid or not. Thus using PCA we need to clean this matrix to make it valid.

The eigenvectors of the above correlation matrix are:

And the eigenvalues of the correlation matrix are:

So the second eigenvalue of the correlation matrix is negative. This is what makes the correlation matrix invalid. Now to clean the correlation matrix and retrieve a valid matrix we set this second eigenvalue to zero in the above vector. Then the new vector of eigenvalues will be:

Then create a diagonal matrix of these eigenvalues, lambda matrix, and perform the following matrix operation:

This gives us the new valid correlation matrix as:

Thus the new (modified) but valid correlation matrix is only slightly different from the original correlation matrix. This is a coincidence (and an actual real life case). Sometimes, the difference between the new correlation matrix and the original one could be significant.

If you now calculate the determinant of this matrix you'll find it very close to zero but still retains a positive value (a very, very small positive value).

Any comments and queries can be sent through our web-based form.

More on Quantitative Finance >>

back to top


More from Articles

Quantitative Finance