{ "cells": [ { "cell_type": "markdown", "id": "3657705c", "metadata": {}, "source": [ "# Mean-Variance Data Preparation" ] }, { "cell_type": "markdown", "id": "9e135fd4", "metadata": {}, "source": [ "## Import time series data" ] }, { "cell_type": "code", "execution_count": 1, "id": "ef31b229", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:44.552541Z", "iopub.status.busy": "2025-01-31T10:05:44.552311Z", "iopub.status.idle": "2025-01-31T10:05:45.314273Z", "shell.execute_reply": "2025-01-31T10:05:45.313528Z" }, "nbsphinx": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (2.2.2)\r\n", "Requirement already satisfied: scipy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (1.15.1)\r\n", "Requirement already satisfied: gurobipy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (11.0.3)\r\n", "Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (2.2.3)\r\n", "Requirement already satisfied: matplotlib in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (3.10.0)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas) (2.9.0.post0)\r\n", "Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas) (2025.1)\r\n", "Requirement already satisfied: tzdata>=2022.7 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas) (2025.1)\r\n", "Requirement already satisfied: contourpy>=1.0.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (1.3.1)\r\n", "Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (0.12.1)\r\n", "Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (4.55.8)\r\n", "Requirement already satisfied: kiwisolver>=1.3.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (1.4.8)\r\n", "Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (24.2)\r\n", "Requirement already satisfied: pillow>=8 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (11.1.0)\r\n", "Requirement already satisfied: pyparsing>=2.3.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (3.2.1)\r\n", "Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "# Install dependencies\n", "%pip install numpy scipy gurobipy pandas matplotlib" ] }, { "cell_type": "code", "execution_count": 2, "id": "ca6c9e2f", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.316410Z", "iopub.status.busy": "2025-01-31T10:05:45.316207Z", "iopub.status.idle": "2025-01-31T10:05:45.728385Z", "shell.execute_reply": "2025-01-31T10:05:45.727812Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import scipy.linalg as la" ] }, { "cell_type": "markdown", "id": "93d4a2eb", "metadata": {}, "source": [ "The primary data used for mean-variance portfolio analysis are _estimates_ of the return and risk of the involved assets. Assuming that the excess returns of the assets are normally distributed, we need to estimate the parameters of the underlying distribution. The methods used for this vary greatly and depend on the nature of the available data. Commonly used return/risk models are:\n", "\n", "1. Estimates from time series\n", "2. Single-factor models such as CAPM\n", "3. Commercial multiple-factor models like from Axioma, Barra or others\n", "\n", "Our primary goal here is to generate sensible, realistic data for demonstrating the modeling and solving features of Gurobi. In order to keep things simple, we will discuss a simplistic variation of option 1 from above and reuse this data through most of our examples." ] }, { "cell_type": "markdown", "id": "fcd9cffd", "metadata": {}, "source": [ "The starting point for our estimators are the weekly closing values for a set of 462 stocks from the S&P500 index, collected over a time span of ten years:" ] }, { "cell_type": "code", "execution_count": 3, "id": "9c651267", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.731104Z", "iopub.status.busy": "2025-01-31T10:05:45.730813Z", "iopub.status.idle": "2025-01-31T10:05:45.750368Z", "shell.execute_reply": "2025-01-31T10:05:45.749669Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
APTVDVNHSYCAGHSTLUVMMCBAVRSKTSLA...GLCPBSTLDBGTDGAEEAAPLAIZUNPK
2013-01-0729.25390637.56544557.68898816.89273111.1127099.57107328.72769263.79613951.3553772.289333...31.92375826.03911211.07146955.64122876.82459321.69000416.00154728.29259550.84017936.513779
2013-01-1428.81728238.20277057.80673616.96588511.45696610.05556628.36157262.12015252.2180862.194000...32.21534725.66502411.38224955.74684174.70050821.78860115.32501228.34891351.52089736.726971
2013-01-2129.33050938.43190859.79222917.48358211.39500110.12869428.30463662.29613552.1890182.260000...32.46442826.15646211.58425457.15729574.18324322.09141514.84151729.20162651.98790036.843266
2013-01-2829.60627638.87588160.59268217.91124311.56713010.38465928.56487362.01120852.8481522.346000...33.06584526.58922811.65418158.70353375.31681822.42944713.43531330.52896552.58549537.579758
2013-02-0429.61393440.78781561.04786318.31605911.29861010.22011128.65495161.71792253.1292692.500667...33.64295226.52321411.74741159.25413174.52992222.83084513.50983430.69791251.91269737.534512
\n", "

5 rows × 462 columns

\n", "
" ], "text/plain": [ " APTV DVN HSY CAG HST LUV \\\n", "2013-01-07 29.253906 37.565445 57.688988 16.892731 11.112709 9.571073 \n", "2013-01-14 28.817282 38.202770 57.806736 16.965885 11.456966 10.055566 \n", "2013-01-21 29.330509 38.431908 59.792229 17.483582 11.395001 10.128694 \n", "2013-01-28 29.606276 38.875881 60.592682 17.911243 11.567130 10.384659 \n", "2013-02-04 29.613934 40.787815 61.047863 18.316059 11.298610 10.220111 \n", "\n", " MMC BA VRSK TSLA ... GL \\\n", "2013-01-07 28.727692 63.796139 51.355377 2.289333 ... 31.923758 \n", "2013-01-14 28.361572 62.120152 52.218086 2.194000 ... 32.215347 \n", "2013-01-21 28.304636 62.296135 52.189018 2.260000 ... 32.464428 \n", "2013-01-28 28.564873 62.011208 52.848152 2.346000 ... 33.065845 \n", "2013-02-04 28.654951 61.717922 53.129269 2.500667 ... 33.642952 \n", "\n", " CPB STLD BG TDG AEE AAPL \\\n", "2013-01-07 26.039112 11.071469 55.641228 76.824593 21.690004 16.001547 \n", "2013-01-14 25.665024 11.382249 55.746841 74.700508 21.788601 15.325012 \n", "2013-01-21 26.156462 11.584254 57.157295 74.183243 22.091415 14.841517 \n", "2013-01-28 26.589228 11.654181 58.703533 75.316818 22.429447 13.435313 \n", "2013-02-04 26.523214 11.747411 59.254131 74.529922 22.830845 13.509834 \n", "\n", " AIZ UNP K \n", "2013-01-07 28.292595 50.840179 36.513779 \n", "2013-01-14 28.348913 51.520897 36.726971 \n", "2013-01-21 29.201626 51.987900 36.843266 \n", "2013-01-28 30.528965 52.585495 37.579758 \n", "2013-02-04 30.697912 51.912697 37.534512 \n", "\n", "[5 rows x 462 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_ts = pd.read_pickle(\"subset_weekly_closings_10yrs.pkl\")\n", "df_ts.head()" ] }, { "cell_type": "markdown", "id": "a0d8ddd1", "metadata": {}, "source": [ "Next, we convert the absolute closing prices to relative changes (\"excess returns\"), in units of percent. We use the Pandas `pct_change` function in order to compute all the ratios in one pass, resulting in a leading row containing `NaN`, which we clean up by simply dropping it. Depending on the context, it may be more meaningful to use _log normal_ returns instead of the plain returns here. For the illustrative purpose of this data, however, this technical aspect doesn't make a difference." ] }, { "cell_type": "code", "execution_count": 4, "id": "ad9d5195", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.752253Z", "iopub.status.busy": "2025-01-31T10:05:45.752051Z", "iopub.status.idle": "2025-01-31T10:05:45.802026Z", "shell.execute_reply": "2025-01-31T10:05:45.801351Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
APTVDVNHSYCAGHSTLUVMMCBAVRSKTSLA...GLCPBSTLDBGTDGAEEAAPLAIZUNPK
2013-01-14-1.4925341.6965730.2041090.4330533.0978715.062058-1.274448-2.6270981.679881-4.164230...0.913394-1.4366402.8070310.189812-2.7648500.454571-4.2279330.1990571.3389360.583867
2013-01-211.7809710.5997933.4347083.051396-0.5408500.727237-0.2007510.283295-0.0556673.008203...0.7731741.9148161.7747412.530105-0.6924521.389784-3.1549393.0079200.9064340.316647
2013-01-280.9402031.1552211.3387242.4460771.5105632.5271300.919414-0.4573761.2629743.805308...1.8525431.6545280.6036402.7052331.5280751.530150-9.4748014.5454291.1494891.998987
2013-02-040.0258664.9180460.7512152.260120-2.321409-1.5845290.315347-0.4729550.5319326.592802...1.745325-0.2482710.7999640.937930-1.0447811.7896010.5546660.553400-1.279437-0.120400
2013-02-11-0.9053071.9838422.8666282.1362010.6703130.8049883.2866693.014293-0.9487401.652877...-0.6319661.9634750.529126-8.1975702.9533560.5860773.3980020.6027010.6327690.688484
\n", "

5 rows × 462 columns

\n", "
" ], "text/plain": [ " APTV DVN HSY CAG HST LUV \\\n", "2013-01-14 -1.492534 1.696573 0.204109 0.433053 3.097871 5.062058 \n", "2013-01-21 1.780971 0.599793 3.434708 3.051396 -0.540850 0.727237 \n", "2013-01-28 0.940203 1.155221 1.338724 2.446077 1.510563 2.527130 \n", "2013-02-04 0.025866 4.918046 0.751215 2.260120 -2.321409 -1.584529 \n", "2013-02-11 -0.905307 1.983842 2.866628 2.136201 0.670313 0.804988 \n", "\n", " MMC BA VRSK TSLA ... GL CPB \\\n", "2013-01-14 -1.274448 -2.627098 1.679881 -4.164230 ... 0.913394 -1.436640 \n", "2013-01-21 -0.200751 0.283295 -0.055667 3.008203 ... 0.773174 1.914816 \n", "2013-01-28 0.919414 -0.457376 1.262974 3.805308 ... 1.852543 1.654528 \n", "2013-02-04 0.315347 -0.472955 0.531932 6.592802 ... 1.745325 -0.248271 \n", "2013-02-11 3.286669 3.014293 -0.948740 1.652877 ... -0.631966 1.963475 \n", "\n", " STLD BG TDG AEE AAPL AIZ \\\n", "2013-01-14 2.807031 0.189812 -2.764850 0.454571 -4.227933 0.199057 \n", "2013-01-21 1.774741 2.530105 -0.692452 1.389784 -3.154939 3.007920 \n", "2013-01-28 0.603640 2.705233 1.528075 1.530150 -9.474801 4.545429 \n", "2013-02-04 0.799964 0.937930 -1.044781 1.789601 0.554666 0.553400 \n", "2013-02-11 0.529126 -8.197570 2.953356 0.586077 3.398002 0.602701 \n", "\n", " UNP K \n", "2013-01-14 1.338936 0.583867 \n", "2013-01-21 0.906434 0.316647 \n", "2013-01-28 1.149489 1.998987 \n", "2013-02-04 -1.279437 -0.120400 \n", "2013-02-11 0.632769 0.688484 \n", "\n", "[5 rows x 462 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_r = df_ts.pct_change().iloc[1:] * 100\n", "df_r.head()" ] }, { "cell_type": "markdown", "id": "70db81ce", "metadata": {}, "source": [ "For example, the weekly return rate for Aptiv PLC (\"APTV\") closing on 2013-01-14 was roughly -1.49%, and the following week +1.78%." ] }, { "cell_type": "markdown", "id": "96795aaa", "metadata": {}, "source": [ "## Compute mean and covariance matrix" ] }, { "cell_type": "markdown", "id": "0d24eb49", "metadata": {}, "source": [ "With the time series data for the stocks at hand, we use the Pandas `mean` and `cov` functions to compute the average returns and their covariance matrix." ] }, { "cell_type": "code", "execution_count": 5, "id": "c0a82cbc", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.803966Z", "iopub.status.busy": "2025-01-31T10:05:45.803780Z", "iopub.status.idle": "2025-01-31T10:05:45.821436Z", "shell.execute_reply": "2025-01-31T10:05:45.820879Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "APTV 0.308637\n", "DVN 0.247540\n", "HSY 0.246571\n", "CAG 0.144669\n", "HST 0.153303\n", "dtype: float64\n", " APTV DVN HSY CAG HST LUV \\\n", "APTV 26.597463 14.778097 3.064101 2.264344 11.601311 11.322953 \n", "DVN 14.778097 42.022493 2.305614 3.309097 11.857115 8.160965 \n", "HSY 3.064101 2.305614 6.338661 3.081948 2.445438 2.462402 \n", "CAG 2.264344 3.309097 3.081948 10.795454 2.251297 2.013963 \n", "HST 11.601311 11.857115 2.445438 2.251297 15.706045 9.121264 \n", "\n", " MMC BA VRSK TSLA ... GL CPB \\\n", "APTV 6.545988 15.774890 6.146144 17.156187 ... 8.749196 0.351038 \n", "DVN 5.611923 12.794991 4.319937 10.399214 ... 9.758414 0.809340 \n", "HSY 2.577923 3.890774 2.662696 4.427763 ... 2.459988 3.089025 \n", "CAG 2.219121 3.173212 2.048475 3.609194 ... 2.640858 4.392320 \n", "HST 4.629432 10.313413 3.824482 7.838277 ... 6.979635 0.363426 \n", "\n", " STLD BG TDG AEE AAPL AIZ \\\n", "APTV 12.383185 7.251853 13.716921 2.616640 7.353120 7.046057 \n", "DVN 15.489979 10.290904 10.162485 1.991456 5.870277 7.621633 \n", "HSY 2.458087 2.602136 3.051838 2.729324 2.244011 2.425771 \n", "CAG 1.946567 2.809631 2.010134 2.376624 2.817637 2.035238 \n", "HST 8.820236 5.097961 8.612474 2.802346 4.067069 6.014143 \n", "\n", " UNP K \n", "APTV 8.246912 2.312723 \n", "DVN 10.036320 2.279012 \n", "HSY 2.248228 3.139629 \n", "CAG 2.040563 4.261462 \n", "HST 5.758568 1.797050 \n", "\n", "[5 rows x 462 columns]\n" ] } ], "source": [ "mu = df_r.mean()\n", "sigma = df_r.cov()\n", "print(mu.head())\n", "print(sigma.head())" ] }, { "cell_type": "markdown", "id": "7ff2a36e", "metadata": {}, "source": [ "The vector $\\mu$ and matrix $\\Sigma$ _could_ be used directly in a typical Markowitz mean-variance optimization model. However, the uncertainty and sensitivity, especially of the return estimator $\\mu$, pose a problem: Optimization models exploit differences among the estimated returns, but they don't have a holistic view of the data that could differentiate between significant differences and sampling or estimation noise. Next, we will present a standard technique that mitigates this effect to some degree." ] }, { "cell_type": "markdown", "id": "f56ec528", "metadata": {}, "source": [ "## Applying shrinkage to the return estimator" ] }, { "cell_type": "markdown", "id": "d82e36db", "metadata": {}, "source": [ "The basic idea of _shrinking_ the return estimator is to combine it with another uniformly biased but noiseless estimator. Mathematically, we will choose a scalar $0 < w < 1$ (shrinkage weight) and define the shrinkage estimator as\n", "\n", "$$\n", "\\mu_s = (1 - w) \\mu + w \\mu_0 1_n\n", "$$\n", "\n", "where $1_n$ is the all-ones vector of appropriate dimension. The vector $\\mu_0 1_n$ is called the _shrinkage target_. There are plenty of established ways to choose shrinkage weight and target; here, we follow the work of Jorion [1] (see also [2, Sec. 7.2])." ] }, { "cell_type": "markdown", "id": "69b7dfff", "metadata": {}, "source": [ "For the shrinkage target $\\mu_0 1_n$ we use the value\n", "\n", "\\begin{equation*}\n", "\\mu_0 = \\frac{\\mu^\\top \\Sigma^{-1} 1_n}{ 1_n^\\top \\Sigma^{-1} 1_n} \\tag{1}\n", "\\end{equation*}\n", "\n", "and compute the shrinkage weight as\n", "\n", "\\begin{equation*}\n", "w = \\frac{n+2}{n + 2 + (\\mu - \\mu_0 1_n) t \\Sigma^{-1} (\\mu - \\mu_0 1_n)} \\tag{2}\n", "\\end{equation*}\n", "\n", "A technical problem with this approach is that it assumes that $\\Sigma$ is the actual covariance matrix of the return. But generally, this is not known exactly and is estimated from the time series, too. In order to reduce the bias towards the estimation error in $\\Sigma$, a common remedy is to use the scaled matrix\n", "\n", "\\begin{equation*}\n", "\\frac{t - 1}{t - n - 2} \\Sigma\n", "\\end{equation*}\n", "\n", "instead of $\\Sigma$ itself ($t$ is the number of time samples, $n$ is the number of assets). As the ratio between data points and assets increases, and the estimation becomes more reliable, this damping approaches the quantity of the estimate. Hence we start the computation of $\\mu_0$ and $w$ as follows:" ] }, { "cell_type": "code", "execution_count": 6, "id": "06e22847", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.823599Z", "iopub.status.busy": "2025-01-31T10:05:45.823410Z", "iopub.status.idle": "2025-01-31T10:05:45.827536Z", "shell.execute_reply": "2025-01-31T10:05:45.827033Z" } }, "outputs": [], "source": [ "t, n = df_r.shape\n", "sigma_shrink = (t - 1) / (t - n - 2) * sigma" ] }, { "cell_type": "markdown", "id": "b5345dcd", "metadata": {}, "source": [ "Next, we compute $\\mu_0$. As a first step, we compute the intermediate quantity\n", "\n", "\\begin{equation*}\n", "z = \\Sigma^{-1} 1_n\n", "\\end{equation*}\n", "\n", "Then, following formula (1), we can obtain $\\mu_0$ as\n", "\n", "\\begin{equation*}\n", "\\mu_0 = \\frac{\\mu^\\top z}{1_n^\\top z}\n", "\\end{equation*}" ] }, { "cell_type": "code", "execution_count": 7, "id": "be3ef941", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.830366Z", "iopub.status.busy": "2025-01-31T10:05:45.829632Z", "iopub.status.idle": "2025-01-31T10:05:45.839619Z", "shell.execute_reply": "2025-01-31T10:05:45.839147Z" } }, "outputs": [], "source": [ "z = la.solve(sigma_shrink, np.ones(n), assume_a=\"pos\")\n", "mu_0 = mu @ z / z.sum()" ] }, { "cell_type": "markdown", "id": "18bdd6a8", "metadata": {}, "source": [ "We split apart the computation of $w$ in (2) above by first computing the temporary vectors\n", "\n", "\\begin{align*}\n", "d & = \\mu - \\mu_0 1_n \\\\\n", "y & = \\Sigma^{-1} d\n", "\\end{align*}\n", "\n", "so that $w$ can be computed as\n", "\n", "$$\n", "w = \\frac{n + 2}{n + 2 + t d^\\top y}\n", "$$" ] }, { "cell_type": "code", "execution_count": 8, "id": "071af1c4", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.841782Z", "iopub.status.busy": "2025-01-31T10:05:45.841597Z", "iopub.status.idle": "2025-01-31T10:05:45.925952Z", "shell.execute_reply": "2025-01-31T10:05:45.925413Z" } }, "outputs": [], "source": [ "d = mu - mu_0 * np.ones(n)\n", "y = la.solve(sigma_shrink, d, assume_a=\"pos\")\n", "w = (n + 2) / (n + 2 + t * d @ y)" ] }, { "cell_type": "markdown", "id": "fab94b46", "metadata": {}, "source": [ "Finally, we compute $\\mu_s$ as follows:" ] }, { "cell_type": "code", "execution_count": 9, "id": "a4e4132a", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.929090Z", "iopub.status.busy": "2025-01-31T10:05:45.928240Z", "iopub.status.idle": "2025-01-31T10:05:45.932434Z", "shell.execute_reply": "2025-01-31T10:05:45.931966Z" } }, "outputs": [], "source": [ "mu_s = (1 - w) * mu + w * mu_0 * np.ones(n)" ] }, { "cell_type": "code", "execution_count": 10, "id": "2fa6e858", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.934533Z", "iopub.status.busy": "2025-01-31T10:05:45.934357Z", "iopub.status.idle": "2025-01-31T10:05:45.939450Z", "shell.execute_reply": "2025-01-31T10:05:45.938946Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total change in mu: 0.3272560076737803\n", "Max change in mu: 0.541640891206755\n" ] } ], "source": [ "print(f\"Total change in mu: {la.norm(mu - mu_s, 2) / la.norm(mu, 2)}\")\n", "print(f\"Max change in mu: {(mu - mu_s).abs().max()}\")" ] }, { "cell_type": "markdown", "id": "a67b5472", "metadata": {}, "source": [ "For this particular data set, this shrinkage results in a relative change of about 30% to the return estimator." ] }, { "cell_type": "markdown", "id": "7555c36f", "metadata": {}, "source": [ "For future use, we save the data as Python pickle files." ] }, { "cell_type": "code", "execution_count": 11, "id": "2a6e34c3", "metadata": { "execution": { "iopub.execute_input": "2025-01-31T10:05:45.941878Z", "iopub.status.busy": "2025-01-31T10:05:45.941703Z", "iopub.status.idle": "2025-01-31T10:05:45.947901Z", "shell.execute_reply": "2025-01-31T10:05:45.947423Z" } }, "outputs": [], "source": [ "mu_s.to_pickle(\"mu.pkl\")\n", "sigma.to_pickle(\"sigma.pkl\")" ] }, { "cell_type": "markdown", "id": "ee4ec2df", "metadata": {}, "source": [ "### Literature\n", "\n", "[1] Jorion, Philippe, 1986. \"Bayes-Stein Estimation for Portfolio Analysis,\" Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.\n", "\n", "[2] Cornuéjols G., Peña J., Tütüncü R. 2018. Optimization Methods in Finance. Cambridge University Press. Second Edition. Markowitz H.M. 1952. Portfolio Selection. The Journal of Finance. 7 (1): 77–91." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }