Note
Download this Jupyter notebook
and all
data
(unzip next to the ipynb file!).
You will need a Gurobi license to run this notebook, please follow the
license instructions.
Mean-Variance Data Preparation¶
Import time series data¶
[2]:
import pandas as pd
import numpy as np
import scipy.linalg as la
The primary data used for mean-variance portfolio analysis are estimates of the return and risk of the involved assets. Assuming that the excess returns of the assets are normally distributed, we need to estimate the parameters of the underlying distribution. The methods used for this vary greatly and depend on the nature of the available data. Commonly used return/risk models are:
Estimates from time series
Single-factor models such as CAPM
Commercial multiple-factor models like from Axioma, Barra or others
Our primary goal here is to generate sensible, realistic data for demonstrating the modeling and solving features of Gurobi. In order to keep things simple, we will discuss a simplistic variation of option 1 from above and reuse this data through most of our examples.
The starting point for our estimators are the weekly closing values for a set of 462 stocks from the S&P500 index, collected over a time span of ten years:
[3]:
df_ts = pd.read_pickle("subset_weekly_closings_10yrs.pkl")
df_ts.head()
[3]:
APTV | DVN | HSY | CAG | HST | LUV | MMC | BA | VRSK | TSLA | ... | GL | CPB | STLD | BG | TDG | AEE | AAPL | AIZ | UNP | K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-01-07 | 29.253906 | 37.565445 | 57.688988 | 16.892731 | 11.112709 | 9.571073 | 28.727692 | 63.796139 | 51.355377 | 2.289333 | ... | 31.923758 | 26.039112 | 11.071469 | 55.641228 | 76.824593 | 21.690004 | 16.001547 | 28.292595 | 50.840179 | 36.513779 |
2013-01-14 | 28.817282 | 38.202770 | 57.806736 | 16.965885 | 11.456966 | 10.055566 | 28.361572 | 62.120152 | 52.218086 | 2.194000 | ... | 32.215347 | 25.665024 | 11.382249 | 55.746841 | 74.700508 | 21.788601 | 15.325012 | 28.348913 | 51.520897 | 36.726971 |
2013-01-21 | 29.330509 | 38.431908 | 59.792229 | 17.483582 | 11.395001 | 10.128694 | 28.304636 | 62.296135 | 52.189018 | 2.260000 | ... | 32.464428 | 26.156462 | 11.584254 | 57.157295 | 74.183243 | 22.091415 | 14.841517 | 29.201626 | 51.987900 | 36.843266 |
2013-01-28 | 29.606276 | 38.875881 | 60.592682 | 17.911243 | 11.567130 | 10.384659 | 28.564873 | 62.011208 | 52.848152 | 2.346000 | ... | 33.065845 | 26.589228 | 11.654181 | 58.703533 | 75.316818 | 22.429447 | 13.435313 | 30.528965 | 52.585495 | 37.579758 |
2013-02-04 | 29.613934 | 40.787815 | 61.047863 | 18.316059 | 11.298610 | 10.220111 | 28.654951 | 61.717922 | 53.129269 | 2.500667 | ... | 33.642952 | 26.523214 | 11.747411 | 59.254131 | 74.529922 | 22.830845 | 13.509834 | 30.697912 | 51.912697 | 37.534512 |
5 rows × 462 columns
Next, we convert the absolute closing prices to relative changes (“excess returns”), in units of percent. We use the Pandas pct_change
function in order to compute all the ratios in one pass, resulting in a leading row containing NaN
, which we clean up by simply dropping it. Depending on the context, it may be more meaningful to use log normal returns instead of the plain returns here. For the illustrative purpose of this data, however, this technical aspect doesn’t make a difference.
[4]:
df_r = df_ts.pct_change().iloc[1:] * 100
df_r.head()
[4]:
APTV | DVN | HSY | CAG | HST | LUV | MMC | BA | VRSK | TSLA | ... | GL | CPB | STLD | BG | TDG | AEE | AAPL | AIZ | UNP | K | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013-01-14 | -1.492534 | 1.696573 | 0.204109 | 0.433053 | 3.097871 | 5.062058 | -1.274448 | -2.627098 | 1.679881 | -4.164230 | ... | 0.913394 | -1.436640 | 2.807031 | 0.189812 | -2.764850 | 0.454571 | -4.227933 | 0.199057 | 1.338936 | 0.583867 |
2013-01-21 | 1.780971 | 0.599793 | 3.434708 | 3.051396 | -0.540850 | 0.727237 | -0.200751 | 0.283295 | -0.055667 | 3.008203 | ... | 0.773174 | 1.914816 | 1.774741 | 2.530105 | -0.692452 | 1.389784 | -3.154939 | 3.007920 | 0.906434 | 0.316647 |
2013-01-28 | 0.940203 | 1.155221 | 1.338724 | 2.446077 | 1.510563 | 2.527130 | 0.919414 | -0.457376 | 1.262974 | 3.805308 | ... | 1.852543 | 1.654528 | 0.603640 | 2.705233 | 1.528075 | 1.530150 | -9.474801 | 4.545429 | 1.149489 | 1.998987 |
2013-02-04 | 0.025866 | 4.918046 | 0.751215 | 2.260120 | -2.321409 | -1.584529 | 0.315347 | -0.472955 | 0.531932 | 6.592802 | ... | 1.745325 | -0.248271 | 0.799964 | 0.937930 | -1.044781 | 1.789601 | 0.554666 | 0.553400 | -1.279437 | -0.120400 |
2013-02-11 | -0.905307 | 1.983842 | 2.866628 | 2.136201 | 0.670313 | 0.804988 | 3.286669 | 3.014293 | -0.948740 | 1.652877 | ... | -0.631966 | 1.963475 | 0.529126 | -8.197570 | 2.953356 | 0.586077 | 3.398002 | 0.602701 | 0.632769 | 0.688484 |
5 rows × 462 columns
For example, the weekly return rate for Aptiv PLC (“APTV”) closing on 2013-01-14 was roughly -1.49%, and the following week +1.78%.
Compute mean and covariance matrix¶
With the time series data for the stocks at hand, we use the Pandas mean
and cov
functions to compute the average returns and their covariance matrix.
[5]:
mu = df_r.mean()
sigma = df_r.cov()
print(mu.head())
print(sigma.head())
APTV 0.308637
DVN 0.247540
HSY 0.246571
CAG 0.144669
HST 0.153303
dtype: float64
APTV DVN HSY CAG HST LUV \
APTV 26.597463 14.778097 3.064101 2.264344 11.601311 11.322953
DVN 14.778097 42.022493 2.305614 3.309097 11.857115 8.160965
HSY 3.064101 2.305614 6.338661 3.081948 2.445438 2.462402
CAG 2.264344 3.309097 3.081948 10.795454 2.251297 2.013963
HST 11.601311 11.857115 2.445438 2.251297 15.706045 9.121264
MMC BA VRSK TSLA ... GL CPB \
APTV 6.545988 15.774890 6.146144 17.156187 ... 8.749196 0.351038
DVN 5.611923 12.794991 4.319937 10.399214 ... 9.758414 0.809340
HSY 2.577923 3.890774 2.662696 4.427763 ... 2.459988 3.089025
CAG 2.219121 3.173212 2.048475 3.609194 ... 2.640858 4.392320
HST 4.629432 10.313413 3.824482 7.838277 ... 6.979635 0.363426
STLD BG TDG AEE AAPL AIZ \
APTV 12.383185 7.251853 13.716921 2.616640 7.353120 7.046057
DVN 15.489979 10.290904 10.162485 1.991456 5.870277 7.621633
HSY 2.458087 2.602136 3.051838 2.729324 2.244011 2.425771
CAG 1.946567 2.809631 2.010134 2.376624 2.817637 2.035238
HST 8.820236 5.097961 8.612474 2.802346 4.067069 6.014143
UNP K
APTV 8.246912 2.312723
DVN 10.036320 2.279012
HSY 2.248228 3.139629
CAG 2.040563 4.261462
HST 5.758568 1.797050
[5 rows x 462 columns]
The vector
Applying shrinkage to the return estimator¶
The basic idea of shrinking the return estimator is to combine it with another uniformly biased but noiseless estimator. Mathematically, we will choose a scalar
where
For the shrinkage target
and compute the shrinkage weight as
A technical problem with this approach is that it assumes that
instead of
[6]:
t, n = df_r.shape
sigma_shrink = (t - 1) / (t - n - 2) * sigma
Next, we compute
Then, following formula (1), we can obtain
[7]:
z = la.solve(sigma_shrink, np.ones(n), assume_a="pos")
mu_0 = mu @ z / z.sum()
We split apart the computation of
so that
[8]:
d = mu - mu_0 * np.ones(n)
y = la.solve(sigma_shrink, d, assume_a="pos")
w = (n + 2) / (n + 2 + t * d @ y)
Finally, we compute
[9]:
mu_s = (1 - w) * mu + w * mu_0 * np.ones(n)
[10]:
print(f"Total change in mu: {la.norm(mu - mu_s, 2) / la.norm(mu, 2)}")
print(f"Max change in mu: {(mu - mu_s).abs().max()}")
Total change in mu: 0.3272560076737803
Max change in mu: 0.541640891206755
For this particular data set, this shrinkage results in a relative change of about 30% to the return estimator.
For future use, we save the data as Python pickle files.
[11]:
mu_s.to_pickle("mu.pkl")
sigma.to_pickle("sigma.pkl")
Literature¶
[1] Jorion, Philippe, 1986. “Bayes-Stein Estimation for Portfolio Analysis,” Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.
[2] Cornuéjols G., Peña J., Tütüncü R. 2018. Optimization Methods in Finance. Cambridge University Press. Second Edition. Markowitz H.M. 1952. Portfolio Selection. The Journal of Finance. 7 (1): 77–91.