{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3657705c",
   "metadata": {},
   "source": [
    "# Mean-Variance Data Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e135fd4",
   "metadata": {},
   "source": [
    "## Import time series data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ef31b229",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:44.552541Z",
     "iopub.status.busy": "2025-01-31T10:05:44.552311Z",
     "iopub.status.idle": "2025-01-31T10:05:45.314273Z",
     "shell.execute_reply": "2025-01-31T10:05:45.313528Z"
    },
    "nbsphinx": "hidden"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (2.2.2)\r\n",
      "Requirement already satisfied: scipy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (1.15.1)\r\n",
      "Requirement already satisfied: gurobipy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (11.0.3)\r\n",
      "Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (2.2.3)\r\n",
      "Requirement already satisfied: matplotlib in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (3.10.0)\r\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas) (2.9.0.post0)\r\n",
      "Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas) (2025.1)\r\n",
      "Requirement already satisfied: tzdata>=2022.7 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas) (2025.1)\r\n",
      "Requirement already satisfied: contourpy>=1.0.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (1.3.1)\r\n",
      "Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (0.12.1)\r\n",
      "Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (4.55.8)\r\n",
      "Requirement already satisfied: kiwisolver>=1.3.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (1.4.8)\r\n",
      "Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (24.2)\r\n",
      "Requirement already satisfied: pillow>=8 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (11.1.0)\r\n",
      "Requirement already satisfied: pyparsing>=2.3.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from matplotlib) (3.2.1)\r\n",
      "Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\r\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "# Install dependencies\n",
    "%pip install numpy scipy gurobipy pandas matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ca6c9e2f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.316410Z",
     "iopub.status.busy": "2025-01-31T10:05:45.316207Z",
     "iopub.status.idle": "2025-01-31T10:05:45.728385Z",
     "shell.execute_reply": "2025-01-31T10:05:45.727812Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import scipy.linalg as la"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93d4a2eb",
   "metadata": {},
   "source": [
    "The primary data used for mean-variance portfolio analysis are _estimates_ of the return and risk of the involved assets.  Assuming that the excess returns of the assets are normally distributed, we need to estimate the parameters of the underlying distribution.  The methods used for this vary greatly and depend on the nature of the available data.  Commonly used return/risk models are:\n",
    "\n",
    "1. Estimates from time series\n",
    "2. Single-factor models such as CAPM\n",
    "3. Commercial multiple-factor models like from Axioma, Barra or others\n",
    "\n",
    "Our primary goal here is to generate sensible, realistic data for demonstrating the modeling and solving features of Gurobi.  In order to keep things simple, we will discuss a simplistic variation of option 1 from above and reuse this data through most of our examples."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcd9cffd",
   "metadata": {},
   "source": [
    "The starting point for our estimators are the weekly closing values for a set of 462 stocks from the S&P500 index, collected over a time span of ten years:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9c651267",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.731104Z",
     "iopub.status.busy": "2025-01-31T10:05:45.730813Z",
     "iopub.status.idle": "2025-01-31T10:05:45.750368Z",
     "shell.execute_reply": "2025-01-31T10:05:45.749669Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>APTV</th>\n",
       "      <th>DVN</th>\n",
       "      <th>HSY</th>\n",
       "      <th>CAG</th>\n",
       "      <th>HST</th>\n",
       "      <th>LUV</th>\n",
       "      <th>MMC</th>\n",
       "      <th>BA</th>\n",
       "      <th>VRSK</th>\n",
       "      <th>TSLA</th>\n",
       "      <th>...</th>\n",
       "      <th>GL</th>\n",
       "      <th>CPB</th>\n",
       "      <th>STLD</th>\n",
       "      <th>BG</th>\n",
       "      <th>TDG</th>\n",
       "      <th>AEE</th>\n",
       "      <th>AAPL</th>\n",
       "      <th>AIZ</th>\n",
       "      <th>UNP</th>\n",
       "      <th>K</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-07</th>\n",
       "      <td>29.253906</td>\n",
       "      <td>37.565445</td>\n",
       "      <td>57.688988</td>\n",
       "      <td>16.892731</td>\n",
       "      <td>11.112709</td>\n",
       "      <td>9.571073</td>\n",
       "      <td>28.727692</td>\n",
       "      <td>63.796139</td>\n",
       "      <td>51.355377</td>\n",
       "      <td>2.289333</td>\n",
       "      <td>...</td>\n",
       "      <td>31.923758</td>\n",
       "      <td>26.039112</td>\n",
       "      <td>11.071469</td>\n",
       "      <td>55.641228</td>\n",
       "      <td>76.824593</td>\n",
       "      <td>21.690004</td>\n",
       "      <td>16.001547</td>\n",
       "      <td>28.292595</td>\n",
       "      <td>50.840179</td>\n",
       "      <td>36.513779</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-14</th>\n",
       "      <td>28.817282</td>\n",
       "      <td>38.202770</td>\n",
       "      <td>57.806736</td>\n",
       "      <td>16.965885</td>\n",
       "      <td>11.456966</td>\n",
       "      <td>10.055566</td>\n",
       "      <td>28.361572</td>\n",
       "      <td>62.120152</td>\n",
       "      <td>52.218086</td>\n",
       "      <td>2.194000</td>\n",
       "      <td>...</td>\n",
       "      <td>32.215347</td>\n",
       "      <td>25.665024</td>\n",
       "      <td>11.382249</td>\n",
       "      <td>55.746841</td>\n",
       "      <td>74.700508</td>\n",
       "      <td>21.788601</td>\n",
       "      <td>15.325012</td>\n",
       "      <td>28.348913</td>\n",
       "      <td>51.520897</td>\n",
       "      <td>36.726971</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-21</th>\n",
       "      <td>29.330509</td>\n",
       "      <td>38.431908</td>\n",
       "      <td>59.792229</td>\n",
       "      <td>17.483582</td>\n",
       "      <td>11.395001</td>\n",
       "      <td>10.128694</td>\n",
       "      <td>28.304636</td>\n",
       "      <td>62.296135</td>\n",
       "      <td>52.189018</td>\n",
       "      <td>2.260000</td>\n",
       "      <td>...</td>\n",
       "      <td>32.464428</td>\n",
       "      <td>26.156462</td>\n",
       "      <td>11.584254</td>\n",
       "      <td>57.157295</td>\n",
       "      <td>74.183243</td>\n",
       "      <td>22.091415</td>\n",
       "      <td>14.841517</td>\n",
       "      <td>29.201626</td>\n",
       "      <td>51.987900</td>\n",
       "      <td>36.843266</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-28</th>\n",
       "      <td>29.606276</td>\n",
       "      <td>38.875881</td>\n",
       "      <td>60.592682</td>\n",
       "      <td>17.911243</td>\n",
       "      <td>11.567130</td>\n",
       "      <td>10.384659</td>\n",
       "      <td>28.564873</td>\n",
       "      <td>62.011208</td>\n",
       "      <td>52.848152</td>\n",
       "      <td>2.346000</td>\n",
       "      <td>...</td>\n",
       "      <td>33.065845</td>\n",
       "      <td>26.589228</td>\n",
       "      <td>11.654181</td>\n",
       "      <td>58.703533</td>\n",
       "      <td>75.316818</td>\n",
       "      <td>22.429447</td>\n",
       "      <td>13.435313</td>\n",
       "      <td>30.528965</td>\n",
       "      <td>52.585495</td>\n",
       "      <td>37.579758</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-02-04</th>\n",
       "      <td>29.613934</td>\n",
       "      <td>40.787815</td>\n",
       "      <td>61.047863</td>\n",
       "      <td>18.316059</td>\n",
       "      <td>11.298610</td>\n",
       "      <td>10.220111</td>\n",
       "      <td>28.654951</td>\n",
       "      <td>61.717922</td>\n",
       "      <td>53.129269</td>\n",
       "      <td>2.500667</td>\n",
       "      <td>...</td>\n",
       "      <td>33.642952</td>\n",
       "      <td>26.523214</td>\n",
       "      <td>11.747411</td>\n",
       "      <td>59.254131</td>\n",
       "      <td>74.529922</td>\n",
       "      <td>22.830845</td>\n",
       "      <td>13.509834</td>\n",
       "      <td>30.697912</td>\n",
       "      <td>51.912697</td>\n",
       "      <td>37.534512</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 462 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 APTV        DVN        HSY        CAG        HST        LUV  \\\n",
       "2013-01-07  29.253906  37.565445  57.688988  16.892731  11.112709   9.571073   \n",
       "2013-01-14  28.817282  38.202770  57.806736  16.965885  11.456966  10.055566   \n",
       "2013-01-21  29.330509  38.431908  59.792229  17.483582  11.395001  10.128694   \n",
       "2013-01-28  29.606276  38.875881  60.592682  17.911243  11.567130  10.384659   \n",
       "2013-02-04  29.613934  40.787815  61.047863  18.316059  11.298610  10.220111   \n",
       "\n",
       "                  MMC         BA       VRSK      TSLA  ...         GL  \\\n",
       "2013-01-07  28.727692  63.796139  51.355377  2.289333  ...  31.923758   \n",
       "2013-01-14  28.361572  62.120152  52.218086  2.194000  ...  32.215347   \n",
       "2013-01-21  28.304636  62.296135  52.189018  2.260000  ...  32.464428   \n",
       "2013-01-28  28.564873  62.011208  52.848152  2.346000  ...  33.065845   \n",
       "2013-02-04  28.654951  61.717922  53.129269  2.500667  ...  33.642952   \n",
       "\n",
       "                  CPB       STLD         BG        TDG        AEE       AAPL  \\\n",
       "2013-01-07  26.039112  11.071469  55.641228  76.824593  21.690004  16.001547   \n",
       "2013-01-14  25.665024  11.382249  55.746841  74.700508  21.788601  15.325012   \n",
       "2013-01-21  26.156462  11.584254  57.157295  74.183243  22.091415  14.841517   \n",
       "2013-01-28  26.589228  11.654181  58.703533  75.316818  22.429447  13.435313   \n",
       "2013-02-04  26.523214  11.747411  59.254131  74.529922  22.830845  13.509834   \n",
       "\n",
       "                  AIZ        UNP          K  \n",
       "2013-01-07  28.292595  50.840179  36.513779  \n",
       "2013-01-14  28.348913  51.520897  36.726971  \n",
       "2013-01-21  29.201626  51.987900  36.843266  \n",
       "2013-01-28  30.528965  52.585495  37.579758  \n",
       "2013-02-04  30.697912  51.912697  37.534512  \n",
       "\n",
       "[5 rows x 462 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ts = pd.read_pickle(\"subset_weekly_closings_10yrs.pkl\")\n",
    "df_ts.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0d8ddd1",
   "metadata": {},
   "source": [
    "Next, we convert the absolute closing prices to relative changes (\"excess returns\"), in units of percent.  We use the Pandas `pct_change` function in order to compute all the ratios in one pass, resulting in a leading row containing `NaN`, which we clean up by simply dropping it. Depending on the context, it may be more meaningful to use _log normal_ returns instead of the plain returns here. For the illustrative purpose of this data, however, this technical aspect doesn't make a difference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ad9d5195",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.752253Z",
     "iopub.status.busy": "2025-01-31T10:05:45.752051Z",
     "iopub.status.idle": "2025-01-31T10:05:45.802026Z",
     "shell.execute_reply": "2025-01-31T10:05:45.801351Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>APTV</th>\n",
       "      <th>DVN</th>\n",
       "      <th>HSY</th>\n",
       "      <th>CAG</th>\n",
       "      <th>HST</th>\n",
       "      <th>LUV</th>\n",
       "      <th>MMC</th>\n",
       "      <th>BA</th>\n",
       "      <th>VRSK</th>\n",
       "      <th>TSLA</th>\n",
       "      <th>...</th>\n",
       "      <th>GL</th>\n",
       "      <th>CPB</th>\n",
       "      <th>STLD</th>\n",
       "      <th>BG</th>\n",
       "      <th>TDG</th>\n",
       "      <th>AEE</th>\n",
       "      <th>AAPL</th>\n",
       "      <th>AIZ</th>\n",
       "      <th>UNP</th>\n",
       "      <th>K</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-14</th>\n",
       "      <td>-1.492534</td>\n",
       "      <td>1.696573</td>\n",
       "      <td>0.204109</td>\n",
       "      <td>0.433053</td>\n",
       "      <td>3.097871</td>\n",
       "      <td>5.062058</td>\n",
       "      <td>-1.274448</td>\n",
       "      <td>-2.627098</td>\n",
       "      <td>1.679881</td>\n",
       "      <td>-4.164230</td>\n",
       "      <td>...</td>\n",
       "      <td>0.913394</td>\n",
       "      <td>-1.436640</td>\n",
       "      <td>2.807031</td>\n",
       "      <td>0.189812</td>\n",
       "      <td>-2.764850</td>\n",
       "      <td>0.454571</td>\n",
       "      <td>-4.227933</td>\n",
       "      <td>0.199057</td>\n",
       "      <td>1.338936</td>\n",
       "      <td>0.583867</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-21</th>\n",
       "      <td>1.780971</td>\n",
       "      <td>0.599793</td>\n",
       "      <td>3.434708</td>\n",
       "      <td>3.051396</td>\n",
       "      <td>-0.540850</td>\n",
       "      <td>0.727237</td>\n",
       "      <td>-0.200751</td>\n",
       "      <td>0.283295</td>\n",
       "      <td>-0.055667</td>\n",
       "      <td>3.008203</td>\n",
       "      <td>...</td>\n",
       "      <td>0.773174</td>\n",
       "      <td>1.914816</td>\n",
       "      <td>1.774741</td>\n",
       "      <td>2.530105</td>\n",
       "      <td>-0.692452</td>\n",
       "      <td>1.389784</td>\n",
       "      <td>-3.154939</td>\n",
       "      <td>3.007920</td>\n",
       "      <td>0.906434</td>\n",
       "      <td>0.316647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-28</th>\n",
       "      <td>0.940203</td>\n",
       "      <td>1.155221</td>\n",
       "      <td>1.338724</td>\n",
       "      <td>2.446077</td>\n",
       "      <td>1.510563</td>\n",
       "      <td>2.527130</td>\n",
       "      <td>0.919414</td>\n",
       "      <td>-0.457376</td>\n",
       "      <td>1.262974</td>\n",
       "      <td>3.805308</td>\n",
       "      <td>...</td>\n",
       "      <td>1.852543</td>\n",
       "      <td>1.654528</td>\n",
       "      <td>0.603640</td>\n",
       "      <td>2.705233</td>\n",
       "      <td>1.528075</td>\n",
       "      <td>1.530150</td>\n",
       "      <td>-9.474801</td>\n",
       "      <td>4.545429</td>\n",
       "      <td>1.149489</td>\n",
       "      <td>1.998987</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-02-04</th>\n",
       "      <td>0.025866</td>\n",
       "      <td>4.918046</td>\n",
       "      <td>0.751215</td>\n",
       "      <td>2.260120</td>\n",
       "      <td>-2.321409</td>\n",
       "      <td>-1.584529</td>\n",
       "      <td>0.315347</td>\n",
       "      <td>-0.472955</td>\n",
       "      <td>0.531932</td>\n",
       "      <td>6.592802</td>\n",
       "      <td>...</td>\n",
       "      <td>1.745325</td>\n",
       "      <td>-0.248271</td>\n",
       "      <td>0.799964</td>\n",
       "      <td>0.937930</td>\n",
       "      <td>-1.044781</td>\n",
       "      <td>1.789601</td>\n",
       "      <td>0.554666</td>\n",
       "      <td>0.553400</td>\n",
       "      <td>-1.279437</td>\n",
       "      <td>-0.120400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-02-11</th>\n",
       "      <td>-0.905307</td>\n",
       "      <td>1.983842</td>\n",
       "      <td>2.866628</td>\n",
       "      <td>2.136201</td>\n",
       "      <td>0.670313</td>\n",
       "      <td>0.804988</td>\n",
       "      <td>3.286669</td>\n",
       "      <td>3.014293</td>\n",
       "      <td>-0.948740</td>\n",
       "      <td>1.652877</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.631966</td>\n",
       "      <td>1.963475</td>\n",
       "      <td>0.529126</td>\n",
       "      <td>-8.197570</td>\n",
       "      <td>2.953356</td>\n",
       "      <td>0.586077</td>\n",
       "      <td>3.398002</td>\n",
       "      <td>0.602701</td>\n",
       "      <td>0.632769</td>\n",
       "      <td>0.688484</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 462 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                APTV       DVN       HSY       CAG       HST       LUV  \\\n",
       "2013-01-14 -1.492534  1.696573  0.204109  0.433053  3.097871  5.062058   \n",
       "2013-01-21  1.780971  0.599793  3.434708  3.051396 -0.540850  0.727237   \n",
       "2013-01-28  0.940203  1.155221  1.338724  2.446077  1.510563  2.527130   \n",
       "2013-02-04  0.025866  4.918046  0.751215  2.260120 -2.321409 -1.584529   \n",
       "2013-02-11 -0.905307  1.983842  2.866628  2.136201  0.670313  0.804988   \n",
       "\n",
       "                 MMC        BA      VRSK      TSLA  ...        GL       CPB  \\\n",
       "2013-01-14 -1.274448 -2.627098  1.679881 -4.164230  ...  0.913394 -1.436640   \n",
       "2013-01-21 -0.200751  0.283295 -0.055667  3.008203  ...  0.773174  1.914816   \n",
       "2013-01-28  0.919414 -0.457376  1.262974  3.805308  ...  1.852543  1.654528   \n",
       "2013-02-04  0.315347 -0.472955  0.531932  6.592802  ...  1.745325 -0.248271   \n",
       "2013-02-11  3.286669  3.014293 -0.948740  1.652877  ... -0.631966  1.963475   \n",
       "\n",
       "                STLD        BG       TDG       AEE      AAPL       AIZ  \\\n",
       "2013-01-14  2.807031  0.189812 -2.764850  0.454571 -4.227933  0.199057   \n",
       "2013-01-21  1.774741  2.530105 -0.692452  1.389784 -3.154939  3.007920   \n",
       "2013-01-28  0.603640  2.705233  1.528075  1.530150 -9.474801  4.545429   \n",
       "2013-02-04  0.799964  0.937930 -1.044781  1.789601  0.554666  0.553400   \n",
       "2013-02-11  0.529126 -8.197570  2.953356  0.586077  3.398002  0.602701   \n",
       "\n",
       "                 UNP         K  \n",
       "2013-01-14  1.338936  0.583867  \n",
       "2013-01-21  0.906434  0.316647  \n",
       "2013-01-28  1.149489  1.998987  \n",
       "2013-02-04 -1.279437 -0.120400  \n",
       "2013-02-11  0.632769  0.688484  \n",
       "\n",
       "[5 rows x 462 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_r = df_ts.pct_change().iloc[1:] * 100\n",
    "df_r.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70db81ce",
   "metadata": {},
   "source": [
    "For example, the weekly return rate for Aptiv PLC (\"APTV\") closing on 2013-01-14 was roughly -1.49%, and the following week +1.78%."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96795aaa",
   "metadata": {},
   "source": [
    "## Compute mean and covariance matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d24eb49",
   "metadata": {},
   "source": [
    "With the time series data for the stocks at hand, we use the Pandas `mean` and `cov` functions to compute the average returns and their covariance matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c0a82cbc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.803966Z",
     "iopub.status.busy": "2025-01-31T10:05:45.803780Z",
     "iopub.status.idle": "2025-01-31T10:05:45.821436Z",
     "shell.execute_reply": "2025-01-31T10:05:45.820879Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "APTV    0.308637\n",
      "DVN     0.247540\n",
      "HSY     0.246571\n",
      "CAG     0.144669\n",
      "HST     0.153303\n",
      "dtype: float64\n",
      "           APTV        DVN       HSY        CAG        HST        LUV  \\\n",
      "APTV  26.597463  14.778097  3.064101   2.264344  11.601311  11.322953   \n",
      "DVN   14.778097  42.022493  2.305614   3.309097  11.857115   8.160965   \n",
      "HSY    3.064101   2.305614  6.338661   3.081948   2.445438   2.462402   \n",
      "CAG    2.264344   3.309097  3.081948  10.795454   2.251297   2.013963   \n",
      "HST   11.601311  11.857115  2.445438   2.251297  15.706045   9.121264   \n",
      "\n",
      "           MMC         BA      VRSK       TSLA  ...        GL       CPB  \\\n",
      "APTV  6.545988  15.774890  6.146144  17.156187  ...  8.749196  0.351038   \n",
      "DVN   5.611923  12.794991  4.319937  10.399214  ...  9.758414  0.809340   \n",
      "HSY   2.577923   3.890774  2.662696   4.427763  ...  2.459988  3.089025   \n",
      "CAG   2.219121   3.173212  2.048475   3.609194  ...  2.640858  4.392320   \n",
      "HST   4.629432  10.313413  3.824482   7.838277  ...  6.979635  0.363426   \n",
      "\n",
      "           STLD         BG        TDG       AEE      AAPL       AIZ  \\\n",
      "APTV  12.383185   7.251853  13.716921  2.616640  7.353120  7.046057   \n",
      "DVN   15.489979  10.290904  10.162485  1.991456  5.870277  7.621633   \n",
      "HSY    2.458087   2.602136   3.051838  2.729324  2.244011  2.425771   \n",
      "CAG    1.946567   2.809631   2.010134  2.376624  2.817637  2.035238   \n",
      "HST    8.820236   5.097961   8.612474  2.802346  4.067069  6.014143   \n",
      "\n",
      "            UNP         K  \n",
      "APTV   8.246912  2.312723  \n",
      "DVN   10.036320  2.279012  \n",
      "HSY    2.248228  3.139629  \n",
      "CAG    2.040563  4.261462  \n",
      "HST    5.758568  1.797050  \n",
      "\n",
      "[5 rows x 462 columns]\n"
     ]
    }
   ],
   "source": [
    "mu = df_r.mean()\n",
    "sigma = df_r.cov()\n",
    "print(mu.head())\n",
    "print(sigma.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ff2a36e",
   "metadata": {},
   "source": [
    "The vector $\\mu$ and matrix $\\Sigma$ _could_ be used directly in a typical Markowitz mean-variance optimization model.  However, the uncertainty and sensitivity, especially of the return estimator $\\mu$, pose a problem:  Optimization models exploit differences among the estimated returns, but they don't have a holistic view of the data that could differentiate between significant differences and sampling or estimation noise.  Next, we will present a standard technique that mitigates this effect to some degree."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f56ec528",
   "metadata": {},
   "source": [
    "## Applying shrinkage to the return estimator"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d82e36db",
   "metadata": {},
   "source": [
    "The basic idea of _shrinking_ the return estimator is to combine it with another uniformly biased but noiseless estimator.  Mathematically, we will choose a scalar $0 < w < 1$ (shrinkage weight) and define the shrinkage estimator as\n",
    "\n",
    "$$\n",
    "\\mu_s = (1 - w) \\mu + w \\mu_0 1_n\n",
    "$$\n",
    "\n",
    "where $1_n$ is the all-ones vector of appropriate dimension.  The vector $\\mu_0 1_n$ is called the _shrinkage target_. There are plenty of established ways to choose shrinkage weight and target; here, we follow the work of Jorion [1] (see also [2, Sec. 7.2])."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69b7dfff",
   "metadata": {},
   "source": [
    "For the shrinkage target $\\mu_0 1_n$ we use the value\n",
    "\n",
    "\\begin{equation*}\n",
    "\\mu_0 = \\frac{\\mu^\\top \\Sigma^{-1} 1_n}{ 1_n^\\top \\Sigma^{-1} 1_n} \\tag{1}\n",
    "\\end{equation*}\n",
    "\n",
    "and compute the shrinkage weight as\n",
    "\n",
    "\\begin{equation*}\n",
    "w = \\frac{n+2}{n + 2 + (\\mu - \\mu_0 1_n) t \\Sigma^{-1} (\\mu - \\mu_0 1_n)} \\tag{2}\n",
    "\\end{equation*}\n",
    "\n",
    "A technical problem with this approach is that it assumes that $\\Sigma$ is the actual covariance matrix of the return.  But generally, this is not known exactly and is estimated from the time series, too.  In order to reduce the bias towards the estimation error in $\\Sigma$, a common remedy is to use the scaled matrix\n",
    "\n",
    "\\begin{equation*}\n",
    "\\frac{t - 1}{t - n - 2} \\Sigma\n",
    "\\end{equation*}\n",
    "\n",
    "instead of $\\Sigma$ itself ($t$ is the number of time samples, $n$ is the number of assets).  As the ratio between data points and assets increases, and the estimation becomes more reliable, this damping approaches the quantity of the estimate. Hence we start the computation of $\\mu_0$ and $w$ as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "06e22847",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.823599Z",
     "iopub.status.busy": "2025-01-31T10:05:45.823410Z",
     "iopub.status.idle": "2025-01-31T10:05:45.827536Z",
     "shell.execute_reply": "2025-01-31T10:05:45.827033Z"
    }
   },
   "outputs": [],
   "source": [
    "t, n = df_r.shape\n",
    "sigma_shrink = (t - 1) / (t - n - 2) * sigma"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5345dcd",
   "metadata": {},
   "source": [
    "Next, we compute $\\mu_0$.  As a first step, we compute the intermediate quantity\n",
    "\n",
    "\\begin{equation*}\n",
    "z = \\Sigma^{-1} 1_n\n",
    "\\end{equation*}\n",
    "\n",
    "Then, following formula (1), we can obtain $\\mu_0$ as\n",
    "\n",
    "\\begin{equation*}\n",
    "\\mu_0 = \\frac{\\mu^\\top z}{1_n^\\top z}\n",
    "\\end{equation*}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "be3ef941",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.830366Z",
     "iopub.status.busy": "2025-01-31T10:05:45.829632Z",
     "iopub.status.idle": "2025-01-31T10:05:45.839619Z",
     "shell.execute_reply": "2025-01-31T10:05:45.839147Z"
    }
   },
   "outputs": [],
   "source": [
    "z = la.solve(sigma_shrink, np.ones(n), assume_a=\"pos\")\n",
    "mu_0 = mu @ z / z.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18bdd6a8",
   "metadata": {},
   "source": [
    "We split apart the computation of $w$ in (2) above by first computing the temporary vectors\n",
    "\n",
    "\\begin{align*}\n",
    "d & = \\mu - \\mu_0 1_n \\\\\n",
    "y & = \\Sigma^{-1} d\n",
    "\\end{align*}\n",
    "\n",
    "so that $w$ can be computed as\n",
    "\n",
    "$$\n",
    "w = \\frac{n + 2}{n + 2 + t d^\\top y}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "071af1c4",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.841782Z",
     "iopub.status.busy": "2025-01-31T10:05:45.841597Z",
     "iopub.status.idle": "2025-01-31T10:05:45.925952Z",
     "shell.execute_reply": "2025-01-31T10:05:45.925413Z"
    }
   },
   "outputs": [],
   "source": [
    "d = mu - mu_0 * np.ones(n)\n",
    "y = la.solve(sigma_shrink, d, assume_a=\"pos\")\n",
    "w = (n + 2) / (n + 2 + t * d @ y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fab94b46",
   "metadata": {},
   "source": [
    "Finally, we compute $\\mu_s$ as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "a4e4132a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.929090Z",
     "iopub.status.busy": "2025-01-31T10:05:45.928240Z",
     "iopub.status.idle": "2025-01-31T10:05:45.932434Z",
     "shell.execute_reply": "2025-01-31T10:05:45.931966Z"
    }
   },
   "outputs": [],
   "source": [
    "mu_s = (1 - w) * mu + w * mu_0 * np.ones(n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2fa6e858",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.934533Z",
     "iopub.status.busy": "2025-01-31T10:05:45.934357Z",
     "iopub.status.idle": "2025-01-31T10:05:45.939450Z",
     "shell.execute_reply": "2025-01-31T10:05:45.938946Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total change in mu: 0.3272560076737803\n",
      "Max   change in mu: 0.541640891206755\n"
     ]
    }
   ],
   "source": [
    "print(f\"Total change in mu: {la.norm(mu - mu_s, 2) / la.norm(mu, 2)}\")\n",
    "print(f\"Max   change in mu: {(mu - mu_s).abs().max()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a67b5472",
   "metadata": {},
   "source": [
    "For this particular data set, this shrinkage results in a relative change of about 30% to the return estimator."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7555c36f",
   "metadata": {},
   "source": [
    "For future use, we save the data as Python pickle files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2a6e34c3",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-01-31T10:05:45.941878Z",
     "iopub.status.busy": "2025-01-31T10:05:45.941703Z",
     "iopub.status.idle": "2025-01-31T10:05:45.947901Z",
     "shell.execute_reply": "2025-01-31T10:05:45.947423Z"
    }
   },
   "outputs": [],
   "source": [
    "mu_s.to_pickle(\"mu.pkl\")\n",
    "sigma.to_pickle(\"sigma.pkl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee4ec2df",
   "metadata": {},
   "source": [
    "### Literature\n",
    "\n",
    "[1] Jorion, Philippe, 1986. \"Bayes-Stein Estimation for Portfolio Analysis,\" Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 21(3), pages 279-292, September.\n",
    "\n",
    "[2] Cornuéjols G., Peña J., Tütüncü R. 2018. Optimization Methods in Finance. Cambridge University Press. Second Edition. Markowitz H.M. 1952. Portfolio Selection. The Journal of Finance. 7 (1): 77–91."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}