Risk-based Portfolios

Shrinking the sample covariance matrix

Prof. Carlos Trucíos
ctrucios@unicamp.br

Instituto de Matemática, Estatística e Computação Científica (IMECC),
Universidade Estadual de Campinas (UNICAMP).

Motivation

What are the two most commonly used statistics overall?

What are the multivariate versions of those statistics?

Motivation

Which is the best estimator of the populational mean in a multivariate setting (\(N > 3\))?

Stein (1956) and James and Stein (1961) answer this question with an unecpected result: It is not the sample mean!
In dimensions \(N > 3\), a better estimator than the sample mean can be obtained by shrinking the sample mean to a target vector
This results was so unexpected and revolucionary that it took a while to be digested and embraced by the academic community.

Motivation

Years later, a similar idea was applied (mainly by Olivier Ledoit and Michael Wolf) to the covariance matrix in large dimensions, giving rise to the shrinkage estimators for the covariance matrix.
Ledoit and Wolf have devoted almost 20 years of their careers to the development of shrinkage estimators, which have been successfully applied in several fields:
- Chemistry
- Electromagnetics
- Genetics
- Neuroscience
- Psychology
- Image and Speech recognition
- Economics and Finance

Shrinkage Estimators

Any Shrinkage estimator for the large-dimensional covariance matrix has three ingredients:

An estimator with no structure (\(S\))
An estimator with a lot of structure (\(F\))
A Shrinkage constant (\(\delta\)) (also known as Shrinkage intensity)

S, F and \(\delta\)

The estimatior with no structure: 🏄‍♀️ the sample covariance matrix.
The Shrinkage constant: 💻 data-driven.
The estimator with a lot of structure: 🤷
- Should involve only a small number of free parameters (i.e, a lot of structure)
- Should reflect important characteristics of the unknown quantity to be estimated.
- Many alternatives

Shrinkage Estimators

Shrinkage estimator for the covariance matrix can be divided into two groups:

Linear Shrinkage Estimators

Easy to understand
Easy to proof
Easy to implement

Non-Linear Shrinkage Estimators

Hard to understand
Hard to proof
Hard to implement
More flexivel

Notation

\(\Sigma\): The true (an unknown) covariance matrix
\(S\): The sample covariance matrix
\(\mathbb{I}\): The identify matrix
\(F\): A target matrix (with a lot os structure)
\(\hat{\Sigma}\): The shrinkage estimator of the covariance matrix
\(T\): Sample size
\(N\): Number os variables (assets)

Linear Shrinkage Estimators

A linear shrinkage estimator is given by:

\[\hat{\Sigma} = (1 - \delta)S + \delta F\]

\(F\)	Reference
\(\sigma^2 \times \mathbb{I}\)	A well-conditioned estimator for large-dimensional covariance matrices (Ledoit and Wolf 2004a)
Single factor of Sharpe (1963) (CAPM)	Improved estimation of the covariance matrix of stock returns with an application to portfolio selection (Ledoit and Wolf 2003)
\(f_{ij} = \sqrt{\sigma_{ii}\sigma_{jj}}\rho\)	Honey, I shrunk the sample covariance matrix (Ledoit and Wolf 2004b)
\(f_{ij} = \eta\) and \(f_{ii} = \sigma^2\)	Essays on risk and return in the stock market (Ledoit 1995)

\(\sigma^2\): common variance across assets
\(\rho\): common correlation across assets
\(\eta\): common covariance across assets

Linear Shrinkage Estimators

Proof

Queremos minimizar \[\mathbb{E} || \hat{\Sigma} - \Sigma ||^2,\] em que \(\hat{\Sigma} = \delta \nu I + (1 - \delta)S\).

\[\mathbb{E} || \hat{\Sigma} - \Sigma ||^2\]

\[= \mathbb{E} || \delta \nu I + (1 - \delta)S - \Sigma ||^2\]

\[= \mathbb{E} || \delta \times (\nu I - \Sigma) + (1 - \delta) \times(S - \Sigma) ||^2\]

Linear Shrinkage Estimators

Recuerde

\(|| \alpha A + \beta B ||^2 = \alpha^2 ||A||^2 + \beta^2 ||B||^2 + 2\alpha \beta <A, B>\)

\[= \mathbb{E} || \delta \times (\nu I - \Sigma) + (1 - \delta) \times(S - \Sigma) ||^2\]

\[= \mathbb{E} \Big[ \delta^2 ||\nu I - \Sigma||^2 + (1 - \delta)^2 || S - \Sigma ||^2 + 2 \delta (1 - \delta) <\nu I - \Sigma, S - \Sigma >) \Big]\]

\[= \delta^2 ||\nu I - \Sigma||^2 + (1 - \delta)^2 \mathbb{E} || S - \Sigma ||^2 + \underbrace{2 \delta (1 - \delta) <\nu I - \Sigma, \underbrace{\mathbb{E} (S - \Sigma)}_{0} >)}_{0} \Big]\]

Linear Shrinkage Estimators

\[= \delta^2 ||\nu I - \Sigma||^2 + (1 - \delta)^2 \mathbb{E} || S - \Sigma ||^2\]

Observação

Encontrar o \(\nu\) óptimo, não depende de \(\delta\). Assim, minimizando \(||\nu I - \Sigma||^2 =\nu^2 ||I||^2 + ||\Sigma||^2 - 2\nu <I, \Sigma>.\):

Derivando w.r.t \(\nu\) e igualando a zero:

\[2 \nu \underbrace{||I||^2}_{1} = 2 <I, \Sigma>\]

\[\nu = <I, \Sigma> = \mu\]

Subtituyendo \(\nu\) por su valor optimo:

Linear Shrinkage Estimators

\[= \delta^2 ||\mu I - \Sigma||^2 + (1 - \delta)^2 \mathbb{E} || S - \Sigma ||^2\]

Lema

Sejan \(\mu = <I, \Sigma>\), \(\alpha^2 = ||\Sigma - \nu I||^2\), \(\beta^2 = \mathbb{E} ||S - \Sigma||^2\) e \(\rho^2 = \mathbb{E} || S - \mu I ||^2\). Então:

\[\alpha^2 + \beta^2 = \rho^2.\]

\[= \delta^2 \alpha^2 + (1 - \delta)^2 \beta^2\]

Linear Shrinkage Estimators

\[= \delta^2 \alpha^2 + (1 - \delta)^2 \beta^2\]

Derivando w.r.t \(\delta\)

\[2 \delta \alpha^2 - 2 (1 - \delta) \beta^2.\]

Igualando a zero:

\(\delta \alpha^2 = \beta^2 - \delta \beta^2 \quad \rightarrow \quad \delta = \dfrac{\beta^2}{\alpha^2 + \beta^2} = \dfrac{\beta^2}{\rho^2}\)

Note

\[\delta = \dfrac{\beta^2}{\alpha^2 + \beta^2} = \dfrac{\mathbb{E} ||S - \Sigma||^2}{\mathbb{E} || S - \mu I ||^2} \quad e \quad 1-\delta = \dfrac{\alpha^2}{\alpha^2 + \beta^2} = \dfrac{||\Sigma - \nu I||^2}{\mathbb{E} || S - \mu I ||^2}\]

Linear Shrinkage Estimators

Optimal \(\delta\) is an oracle!

Can be proved that

\(m^2 - \mu^2 \rightarrow_{L2} 0\)
\(d^2 - \rho^2 \rightarrow_{L2} 0\)
\(b^2 - \beta^2 \rightarrow_{L2} 0\)
\(a^2 - \alpha^2 \rightarrow_{L2} 0\)
\(\hat{S} = \dfrac{b^2}{d^2} m I + \dfrac{a^2}{d^2}S\)

Em que \(m = <S, I>\), \(d^2 = ||S - mI||^2\), \(\bar{b}^2 = \dfrac{1}{n^2} \sum_{k = 1}^n ||x_k x_k' - S||^2\), \(b^2 = min(\bar{b}^2, d^2)\) and \(a^2 = d^2 - b^2\).

Non-Linear Shrinkage Estimators

The different formulations of \(F\) are essentially improvements of the generic linear shrinkage.
Such improvements are obtained by incorporating prior information or structural assumptions about the target covariance matrix.
Is it possible to improve the generic linear shrinkage with no previous knowledge about the target covariance matrix?

Non-Linear Shrinkage Estimators

First Idea

Why not, instead of using a common shrinkage intensity (\(\delta\)), we use different intensities for different entries in \(S\)? 😄

The number of shrinkage intensities increases on the order of \(N^2\) 😿
Allowing different shrinkage intensities no longer guarantees that the final estimator is positive semi-definite 😢
So… that way doesn’t really looks like the right way, does it?

Non-Linear Shrinkage Estimators

Let \(\lambda_1, \cdots, \lambda_N\) be the eiganvalues of \(S\) and let

\[(1 - \delta) S + \delta F = \hat{\Sigma} = U \Lambda^{\ast} U\] be the spectral decomposition of \(\hat{\Sigma}\).

Can be proved that the elements \(\lambda_1^{\ast}, \cdots, \lambda_N^{\ast}\) of the diagonal matrix \(\Lambda^{\ast}\) are equal to \[\lambda_i^{\ast} = \delta \sigma^2 + (1 - \delta) \lambda_i\]

This means that \(\hat{\Sigma}\) has the same eigenvectors than \(S\), but different eigenvalues.
Now, a genralization of the generic linear shrinkage, seems to be more obvious: Let’s use different shrinkage intensities for the different sample eigenvalues!

Non-Linear Shrinkage Estimators

Nowadays, there are several alternatives to the sample covariance matrix.
A great source for codes is https://github.com/MikeWolf007/covShrinkage
Some R packages are also available:
- nlshrink
- cvCovEst
- ShrinkCovMat
Recent methodological contributions include the works of Ledoit and Wolf (2020), Ledoit and Wolf (2022), De Nard (2022), among others.
In a recent paper (Trucı́os 2025), I conduct a large-scale empirical comparison using real-world data in the context of portfolio allocation.

References

De Nard, Gianluca. 2022. “Oops! I Shrunk the Sample Covariance Matrix Again: Blockbuster Meets Shrinkage.” Journal of Financial Econometrics 20 (4): 569–611.

James, William, and Charles Stein. 1961. “Estimation with Quadratic Loss.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361–79. 1961. University of California Press.

Ledoit, Olivier. 1995. “Essays on Risk and Return in the Stock Market.” PhD thesis, Massachusetts Institute of Technology.

Ledoit, Olivier, and Michael Wolf. 2003. “Improved Estimation of the Covariance Matrix of Stock Returns with an Application to Portfolio Selection.” Journal of Empirical Finance 10 (5): 603–21.

———. 2004a. “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices.” Journal of Multivariate Analysis 88 (2): 365–411.

———. 2004b. “Honey, i Shrunk the Sample Covariance Matrix.” The Journal of Portfolio Management 30 (4): 110–19.

———. 2020. “Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices.” The Annals of Statistics 48 (5): 3043–65.

———. 2022. “Quadratic Shrinkage for Large Covariance Matrices.” Bernoulli 28 (3): 1519–47.

Sharpe, William F. 1963. “A Simplified Model for Portfolio Analysis.” Management Science 9 (2): 277–93.

Stein, Charles. 1956. “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution.” In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1:197–206. 1.

Trucı́os, Carlos. 2025. “Hierarchical Risk Clustering Versus Traditional Risk-Based Portfolios: An Empirical Out-of-Sample Comparison.” Available at SSRN 5247627.