Note
Download this Jupyter notebook
and all
data
(unzip next to the ipynb file!).
You will need a Gurobi license to run this notebook, please follow the
license instructions.
Mean-Variance Data Preparation¶
Import time series data¶
[2]:
import pandas as pd
import numpy as np
import scipy.linalg as la
The primary data used for mean-variance portfolio analysis are estimates of the return and risk of the involved assets. Assuming that the excess returns of the assets are normally distributed, we need to estimate the parameters of the underlying distribution. The methods used for this vary greatly and depend on the nature of the available data. Commonly used return/risk models are:
Estimates from time series
Single-factor models such as CAPM
Commercial multiple-factor models like from Axioma, Barra or others
Our primary goal here is to generate sensible, realistic data for demonstrating the modeling and solving features of Gurobi. In order to keep things simple, we will discuss a simplistic variation of option 1 from above and reuse this data through most of our examples.
The starting point for our estimators are the weekly closing values for a set of 462 stocks from the S&P500 index, collected over a time span of ten years:
[3]:
df_ts = pd.read_pickle("subset_weekly_closings_10yrs.pkl")
df_ts.head()
[3]:
APTV | DVN | HSY | CAG | HST | LUV | MMC | BA | VRSK | TSLA | ... | GL | CPB | STLD | BG | TDG | AEE | AAPL | AIZ | UNP | K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-01-07 | 29.253906 | 37.565445 | 57.688988 | 16.892731 | 11.112709 | 9.571073 | 28.727692 | 63.796139 | 51.355377 | 2.289333 | ... | 31.923758 | 26.039112 | 11.071469 | 55.641228 | 76.824593 | 21.690004 | 16.001547 | 28.292595 | 50.840179 | 36.513779 |
2013-01-14 | 28.817282 | 38.202770 | 57.806736 | 16.965885 | 11.456966 | 10.055566 | 28.361572 | 62.120152 | 52.218086 | 2.194000 | ... | 32.215347 | 25.665024 | 11.382249 | 55.746841 | 74.700508 | 21.788601 | 15.325012 | 28.348913 | 51.520897 | 36.726971 |
2013-01-21 | 29.330509 | 38.431908 | 59.792229 | 17.483582 | 11.395001 | 10.128694 | 28.304636 | 62.296135 | 52.189018 | 2.260000 | ... | 32.464428 | 26.156462 | 11.584254 | 57.157295 | 74.183243 | 22.091415 | 14.841517 | 29.201626 | 51.987900 | 36.843266 |
2013-01-28 | 29.606276 | 38.875881 | 60.592682 | 17.911243 | 11.567130 | 10.384659 | 28.564873 | 62.011208 | 52.848152 | 2.346000 | ... | 33.065845 | 26.589228 | 11.654181 | 58.703533 | 75.316818 | 22.429447 | 13.435313 | 30.528965 | 52.585495 | 37.579758 |
2013-02-04 | 29.613934 | 40.787815 | 61.047863 | 18.316059 | 11.298610 | 10.220111 | 28.654951 | 61.717922 | 53.129269 | 2.500667 | ... | 33.642952 | 26.523214 | 11.747411 | 59.254131 | 74.529922 | 22.830845 | 13.509834 | 30.697912 | 51.912697 | 37.534512 |
5 rows × 462 columns
Next, we convert the absolute closing prices to relative changes (“excess returns”), in units of percent. We use the Pandas pct_change
function in order to compute all the ratios in one pass, resulting in a leading row containing NaN
, which we clean up by simply dropping it. Depending on the context, it may be more meaningful to use log normal returns instead of the plain returns here. For the illustrative purpose of this data, however, this technical aspect doesn’t make a difference.
[4]:
df_r = df_ts.pct_change().iloc[1:] * 100
df_r.head()
[4]:
APTV | DVN | HSY | CAG | HST | LUV | MMC | BA | VRSK | TSLA | ... | GL | CPB | STLD | BG | TDG | AEE | AAPL | AIZ | UNP | K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-01-14 | -1.492534 | 1.696573 | 0.204109 | 0.433053 | 3.097871 | 5.062058 | -1.274448 | -2.627098 | 1.679881 | -4.164230 | ... | 0.913394 | -1.436640 | 2.807031 | 0.189812 | -2.764850 | 0.454571 | -4.227933 | 0.199057 | 1.338936 | 0.583867 |
2013-01-21 | 1.780971 | 0.599793 | 3.434708 | 3.051396 | -0.540850 | 0.727237 | -0.200751 | 0.283295 | -0.055667 | 3.008203 | ... | 0.773174 | 1.914816 | 1.774741 | 2.530105 | -0.692452 | 1.389784 | -3.154939 | 3.007920 | 0.906434 | 0.316647 |
2013-01-28 | 0.940203 | 1.155221 | 1.338724 | 2.446077 | 1.510563 | 2.527130 | 0.919414 | -0.457376 | 1.262974 | 3.805308 | ... | 1.852543 | 1.654528 | 0.603640 | 2.705233 | 1.528075 | 1.530150 | -9.474801 | 4.545429 | 1.149489 | 1.998987 |
2013-02-04 | 0.025866 | 4.918046 | 0.751215 | 2.260120 | -2.321409 | -1.584529 | 0.315347 | -0.472955 | 0.531932 | 6.592802 | ... | 1.745325 | -0.248271 | 0.799964 | 0.937930 | -1.044781 | 1.789601 | 0.554666 | 0.553400 | -1.279437 | -0.120400 |
2013-02-11 | -0.905307 | 1.983842 | 2.866628 | 2.136201 | 0.670313 | 0.804988 | 3.286669 | 3.014293 | -0.948740 | 1.652877 | ... | -0.631966 | 1.963475 | 0.529126 | -8.197570 | 2.953356 | 0.586077 | 3.398002 | 0.602701 | 0.632769 | 0.688484 |
5 rows × 462 columns
For example, the weekly return rate for Aptiv PLC (“APTV”) closing on 2013-01-14 was roughly -1.49%, and the following week +1.78%.
Compute mean and covariance matrix¶
With the time series data for the stocks at hand, we use the Pandas mean
and cov
functions to compute the average returns and their covariance matrix.
[5]:
mu = df_r.mean()
sigma = df_r.cov()
print(mu.head())
print(sigma.head())
APTV 0.308637
DVN 0.247540
HSY 0.246571
CAG 0.144669
HST 0.153303
dtype: float64
APTV DVN HSY CAG HST LUV \
APTV 26.597463 14.778097 3.064101 2.264344 11.601311 11.322953
DVN 14.778097 42.022493 2.305614 3.309097 11.857115 8.160965
HSY 3.064101 2.305614 6.338661 3.081948 2.445438 2.462402
CAG 2.264344 3.309097 3.081948 10.795454 2.251297 2.013963
HST 11.601311 11.857115 2.445438 2.251297 15.706045 9.121264
MMC BA VRSK TSLA ... GL CPB \
APTV 6.545988 15.774890 6.146144 17.156187 ... 8.749196 0.351038
DVN 5.611923 12.794991 4.319937 10.399214 ... 9.758414 0.809340
HSY 2.577923 3.890774 2.662696 4.427763 ... 2.459988 3.089025
CAG 2.219121 3.173212 2.048475 3.609194 ... 2.640858 4.392320
HST 4.629432 10.313413 3.824482 7.838277 ... 6.979635 0.363426
STLD BG TDG AEE AAPL AIZ \
APTV 12.383185 7.251853 13.716921 2.616640 7.353120 7.046057
DVN 15.489979 10.290904 10.162485 1.991456 5.870277 7.621633
HSY 2.458087 2.602136 3.051838 2.729324 2.244011 2.425771
CAG 1.946567 2.809631 2.010134 2.376624 2.817637 2.035238
HST 8.820236 5.097961 8.612474 2.802346 4.067069 6.014143
UNP K
APTV 8.246912 2.312723
DVN 10.036320 2.279012
HSY 2.248228 3.139629
CAG 2.040563 4.261462
HST 5.758568 1.797050
[5 rows x 462 columns]
The vector \(\mu\) and matrix \(\Sigma\) could be used directly in a typical Markowitz mean-variance optimization model. However, the uncertainty and sensitivity, especially of the return estimator \(\mu\), pose a problem: Optimization models exploit differences among the estimated returns, but they don’t have a holistic view of the data that could differentiate between significant differences and sampling or estimation noise. Next, we will present a standard technique that mitigates this effect to some degree.
Applying shrinkage to the return estimator¶
The basic idea of shrinking the return estimator is to combine it with another uniformly biased but noiseless estimator. Mathematically, we will choose a scalar \(0 < w < 1\) (shrinkage weight) and define the shrinkage estimator as
where \(1_n\) is the all-ones vector of appropriate dimension. The vector \(\mu_0 1_n\) is called the shrinkage target. There are plenty of established ways to choose shrinkage weight and target; here, we follow the work of Jorion [1] (see also [2, Sec. 7.2]).
For the shrinkage target \(\mu_0 1_n\) we use the value
\begin{equation*} \mu_0 = \frac{\mu^\top \Sigma^{-1} 1_n}{ 1_n^\top \Sigma^{-1} 1_n} \tag{1} \end{equation*}
and compute the shrinkage weight as
\begin{equation*} w = \frac{n+2}{n + 2 + (\mu - \mu_0 1_n) t \Sigma^{-1} (\mu - \mu_0 1_n)} \tag{2} \end{equation*}
A technical problem with this approach is that it assumes that \(\Sigma\) is the actual covariance matrix of the return. But generally, this is not known exactly and is estimated from the time series, too. In order to reduce the bias towards the estimation error in \(\Sigma\), a common remedy is to use the scaled matrix
\begin{equation*} \frac{t - 1}{t - n - 2} \Sigma \end{equation*}
instead of \(\Sigma\) itself (\(t\) is the number of time samples, \(n\) is the number of assets). As the ratio between data points and assets increases, and the estimation becomes more reliable, this damping approaches the quantity of the estimate. Hence we start the computation of \(\mu_0\) and \(w\) as follows:
[6]:
t, n = df_r.shape
sigma_shrink = (t - 1) / (t - n - 2) * sigma
Next, we compute \(\mu_0\). As a first step, we compute the intermediate quantity
\begin{equation*} z = \Sigma^{-1} 1_n \end{equation*}
Then, following formula (1), we can obtain \(\mu_0\) as
\begin{equation*} \mu_0 = \frac{\mu^\top z}{1_n^\top z} \end{equation*}
[7]:
z = la.solve(sigma_shrink, np.ones(n), assume_a="pos")
mu_0 = mu @ z / z.sum()
We split apart the computation of \(w\) in (2) above by first computing the temporary vectors
\begin{align*} d & = \mu - \mu_0 1_n \\ y & = \Sigma^{-1} d \end{align*}
so that \(w\) can be computed as
[8]:
d = mu - mu_0 * np.ones(n)
y = la.solve(sigma_shrink, d, assume_a="pos")
w = (n + 2) / (n + 2 + t * d @ y)
Finally, we compute \(\mu_s\) as follows:
[9]:
mu_s = (1 - w) * mu + w * mu_0 * np.ones(n)
[10]:
print(f"Total change in mu: {la.norm(mu - mu_s, 2) / la.norm(mu, 2)}")
print(f"Max change in mu: {(mu - mu_s).abs().max()}")
Total change in mu: 0.3272560076737803
Max change in mu: 0.541640891206755
For this particular data set, this shrinkage results in a relative change of about 30% to the return estimator.
For future use, we save the data as Python pickle files.
[11]:
mu_s.to_pickle("mu.pkl")
sigma.to_pickle("sigma.pkl")
Literature¶
[1] Jorion, Philippe, 1986. “Bayes-Stein Estimation for Portfolio Analysis,” Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.
[2] Cornuéjols G., Peña J., Tütüncü R. 2018. Optimization Methods in Finance. Cambridge University Press. Second Edition. Markowitz H.M. 1952. Portfolio Selection. The Journal of Finance. 7 (1): 77–91.