Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Computation of the Loss Distribution not only for Operational Risk Managers

In the Operational Risk Management, given a number/type of risks or/and business line combinations, the quest is all about providing the risk management board with an estimation of the losses the bank (or any other financial institution, hedge-fund, etc.) can suffer from. If you think for a second, the spectrum of things that might go wrong is wide, e.g. the failure of a computer system, an internal or external fraud, clients, products, business practices, a damage to physical goods, and so on. These ones blended with business lines, e.g. corporate finance, trading and sales, retail banking, commercial banking, payment and settlement, agency services, asset management, or retail brokerage return over 50 combinations of the operational risk factors one needs to consider. Separately and carefully. And it’s a tough one.

Why? A good question “why?”! Simply because of two main reasons. For an operational risk manager the sample of data describing the risk is usually insufficient (statistically speaking: the sample is small over the life period of the financial organ). Secondly, when something goes wrong, the next (of the same kind) event may take place in not-to-distant future or in far-distant future. The biggest problem the operational risk manager meets in his/her daily work regards the prediction of all/any losses due to operational failures. Therefore, the time of the (next) event comes in as an independent variable into that equation: the loss frequency distribution. The second player in the game is: the loss severity distribution, i.e. if the worst strikes, how much the bank/financial body/an investor/a trader might lose?!

From a perspective of a trader we well know that Value-at-Risk (VaR) and the Expected Shortfall are two quantitative risk measures that address similar questions. But from the viewpoint of the operational risk, the estimation of losses requires a different approach.

In this post, after Hull (2015), we present an algorithm in Python for computation of the loss distribution given the best estimation of the loss frequency and loss severity distributions. Though designed for operation risk analysts in mind, in the end we argue its usefulness for any algo-trader and/or portfolio risk manager.

1. Operational Losses: Case Study of the Vanderloo Bank

An access to operational loss data is much much harder than in case of stocks traded in the exchange. They usually stay within the walls of the bank, with an internal access only. A recommended practice for operational risk managers around the world is to share those unique data despite confidentiality. Only in that instance we can build a broader knowledge and understanding of risks and incurred losses due to operational activities.

Let’s consider a case study of a hypothetical Vanderloo Bank. The bank had been found in 1988 in Netherlands and its main line of business was concentrated around building unique customer relationships and loans for small local businesses. Despite a vivid vision and firmly set goals for the future, Vanderloo Bank could not avoid a number of operational roadblocks that led to a substantial operational losses:

Year Month Day Business Line Risk Category Loss ($M) 0 1989.0 1.0 13.0 Trading and Sales Internal Fraud 0.530597 1 1989.0 2.0 9.0 Retail Brokerage Process Failure 0.726702 2 1989.0 4.0 14.0 Trading and Sales System Failure 1.261619 3 1989.0 6.0 11.0 Asset Managment Process Failure 1.642279 4 1989.0 7.0 23.0 Corporate Finance Process Failure 1.094545 5 1990.0 10.0 21.0 Trading and Sales Employment Practices 0.562122 6 1990.0 12.0 24.0 Payment and Settlement Process Failure 4.009160 7 1991.0 8.0 23.0 Asset Managment Business Practices 0.495025 8 1992.0 1.0 28.0 Asset Managment Business Practices 0.857785 9 1992.0 3.0 14.0 Commercial Banking Damage to Assets 1.257536 10 1992.0 5.0 26.0 Retail Banking Internal Fraud 1.591007 11 1992.0 8.0 9.0 Corporate Finance Employment Practices 0.847832 12 1993.0 1.0 11.0 Corporate Finance System Failure 1.314225 13 1993.0 1.0 19.0 Retail Banking Internal Fraud 0.882371 14 1993.0 2.0 24.0 Retail Banking Internal Fraud 1.213686 15 1993.0 6.0 12.0 Commercial Banking System Failure 1.231784 16 1993.0 6.0 16.0 Agency Services Damage to Assets 1.316528 17 1993.0 7.0 11.0 Retail Banking Process Failure 0.834648 18 1993.0 9.0 21.0 Retail Brokerage Process Failure 0.541243 19 1993.0 11.0 11.0 Asset Managment Internal Fraud 1.380636 20 1994.0 11.0 22.0 Retail Banking External Fraud 1.426433 21 1995.0 2.0 14.0 Commercial Banking Process Failure 1.051281 22 1995.0 11.0 21.0 Commercial Banking External Fraud 2.654861 23 1996.0 8.0 17.0 Agency Services Process Failure 0.837237 24 1997.0 7.0 13.0 Retail Brokerage Internal Fraud 1.107019 25 1997.0 7.0 24.0 Agency Services External Fraud 1.513146 26 1997.0 8.0 8.0 Retail Banking Process Failure 1.002040 27 1997.0 9.0 2.0 Agency Services Damage to Assets 0.646596 28 1997.0 9.0 12.0 Retail Banking Employment Practices 0.966086 29 1998.0 1.0 8.0 Retail Banking Internal Fraud 0.938803 30 1998.0 1.0 12.0 Retail Banking System Failure 0.922069 31 1998.0 2.0 5.0 Asset Managment Process Failure 1.042259 32 1998.0 4.0 18.0 Commercial Banking External Fraud 0.969562 33 1998.0 5.0 12.0 Retail Banking External Fraud 0.683715 34 1999.0 1.0 3.0 Trading and Sales Internal Fraud 2.035785 35 1999.0 4.0 27.0 Retail Brokerage Business Practices 1.074277 36 1999.0 5.0 8.0 Retail Banking Employment Practices 0.667655 37 1999.0 7.0 10.0 Agency Services System Failure 0.499982 38 1999.0 7.0 17.0 Retail Brokerage Process Failure 0.803826 39 2000.0 1.0 26.0 Commercial Banking Business Practices 0.714091 40 2000.0 7.0 23.0 Trading and Sales System Failure 1.479367 41 2001.0 6.0 16.0 Retail Brokerage System Failure 1.233686 42 2001.0 11.0 5.0 Agency Services Process Failure 0.926593 43 2002.0 5.0 14.0 Payment and Settlement Damage to Assets 1.321291 44 2002.0 11.0 11.0 Retail Banking External Fraud 1.830254 45 2003.0 1.0 14.0 Corporate Finance System Failure 1.056228 46 2003.0 1.0 28.0 Asset Managment System Failure 1.684986 47 2003.0 2.0 28.0 Commercial Banking Damage to Assets 0.680675 48 2004.0 1.0 11.0 Asset Managment Process Failure 0.559822 49 2004.0 6.0 19.0 Commercial Banking Internal Fraud 1.388681 50 2004.0 7.0 3.0 Retail Banking Internal Fraud 0.886769 51 2004.0 7.0 21.0 Retail Brokerage Employment Practices 0.606049 52 2004.0 7.0 27.0 Asset Managment Employment Practices 1.634348 53 2004.0 11.0 26.0 Asset Managment Damage to Assets 0.983355 54 2005.0 1.0 9.0 Corporate Finance Damage to Assets 0.969710 55 2005.0 9.0 17.0 Commercial Banking System Failure 0.634609 56 2006.0 2.0 24.0 Agency Services Business Practices 0.637760 57 2006.0 3.0 21.0 Retail Banking Employment Practices 1.072489 58 2006.0 6.0 25.0 Payment and Settlement System Failure 0.896459 59 2006.0 12.0 25.0 Trading and Sales Process Failure 0.731953 60 2007.0 6.0 9.0 Commercial Banking System Failure 0.918233 61 2008.0 1.0 5.0 Corporate Finance External Fraud 0.929702 62 2008.0 2.0 14.0 Retail Brokerage System Failure 0.640201 63 2008.0 2.0 14.0 Commercial Banking Internal Fraud 1.580574 64 2008.0 3.0 18.0 Corporate Finance Process Failure 0.731046 65 2009.0 2.0 1.0 Agency Services System Failure 0.630870 66 2009.0 2.0 6.0 Retail Banking External Fraud 0.639761 67 2009.0 4.0 14.0 Payment and Settlement Internal Fraud 1.022987 68 2009.0 5.0 25.0 Retail Banking Business Practices 1.415880 69 2009.0 7.0 8.0 Retail Banking Business Practices 0.906526 70 2009.0 12.0 26.0 Agency Services System Failure 1.463529 71 2010.0 2.0 13.0 Asset Managment Damage to Assets 0.664935 72 2010.0 3.0 24.0 Payment and Settlement Process Failure 1.848318 73 2010.0 10.0 16.0 Commercial Banking External Fraud 1.020736 74 2010.0 12.0 27.0 Retail Banking Employment Practices 1.126265 75 2011.0 2.0 5.0 Retail Brokerage Process Failure 1.549890 76 2011.0 6.0 24.0 Corporate Finance Damage to Assets 2.153238 77 2011.0 11.0 6.0 Asset Managment System Failure 0.601332 78 2011.0 12.0 1.0 Payment and Settlement External Fraud 0.551183 79 2012.0 2.0 21.0 Corporate Finance External Fraud 1.866740 80 2013.0 4.0 22.0 Retail Brokerage External Fraud 0.672756 81 2013.0 6.0 27.0 Payment and Settlement Employment Practices 1.119233 82 2013.0 8.0 17.0 Commercial Banking System Failure 1.034078 83 2014.0 3.0 1.0 Asset Managment Employment Practices 2.099957 84 2014.0 4.0 4.0 Retail Brokerage External Fraud 0.929928 85 2014.0 6.0 5.0 Retail Banking System Failure 1.399936 86 2014.0 11.0 17.0 Asset Managment Process Failure 1.299063 87 2014.0 12.0 3.0 Agency Services System Failure 1.787205 88 2015.0 2.0 2.0 Payment and Settlement System Failure 0.742544 89 2015.0 6.0 23.0 Commercial Banking Employment Practices 2.139426 90 2015.0 7.0 18.0 Trading and Sales System Failure 0.499308 91 2015.0 9.0 9.0 Retail Banking Employment Practices 1.320201 92 2015.0 9.0 18.0 Corporate Finance Business Practices 2.901466 93 2015.0 10.0 21.0 Commercial Banking Internal Fraud 0.808329 94 2016.0 1.0 9.0 Retail Banking Internal Fraud 1.314893 95 2016.0 3.0 28.0 Asset Managment Business Practices 0.702811 96 2016.0 3.0 25.0 Payment and Settlement Internal Fraud 0.840262 97 2016.0 4.0 6.0 Retail Banking Process Failure 0.465896 Having a record of 97 events, now we can begin building a statistical picture on loss frequency and loss severity distribution. 2. Loss Frequency Distribution For loss frequency, the natural probability distribution to use is a Poisson distribution. It assumes that losses happen randomly through time so that in any short period of time$\Delta t$there is a probability of$\lambda \Delta t$of a loss occurring. The probability of$n$losses in time$T$[years] is: $$\mbox{Pr} = \exp{(-\lambda T)} \frac{(\lambda T)^n}{n!}$$ where the parameter$\lambda$can be estimated as the average number of losses per year (Hull 2015). Given our table in the Python pandas’ DataFrame format, df, we code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Computation of the Loss Distribution not only for Operational Risk Managers # (c) 2016 QuantAtRisk.com, Pawel Lachowicz from scipy.stats import lognorm, norm, poisson from matplotlib import pyplot as plt import numpy as np import pandas as pd # reading Vanderoo Bank operational loss data df = pd.read_hdf('vanderloo.h5', 'df') # count the number of loss events in given year fre = df.groupby("Year").size() print(fre) where the last operation groups and displays the number of losses in each year: Year 1989.0 5 1990.0 2 1991.0 1 1992.0 4 1993.0 8 1994.0 1 1995.0 2 1996.0 1 1997.0 5 1998.0 5 1999.0 5 2000.0 2 2001.0 2 2002.0 2 2003.0 3 2004.0 6 2005.0 2 2006.0 4 2007.0 1 2008.0 4 2009.0 6 2010.0 4 2011.0 4 2012.0 1 2013.0 3 2014.0 5 2015.0 6 2016.0 4 dtype: int64 The estimation of Poisson’s$\lambda$requires solely the computation of: 16 17 18 # estimate lambda parameter lam = np.sum(fre.values) / (df.Year[df.shape[0]-1] - df.Year[0]) print(lam) 3.62962962963 what informs us that during 1989–2016 period, i.e. over the past 27 years, there were$\lambda = 3.6losses per year. Assuming Poisson distribution as the best descriptor for loss frequency distribution, we model the probability of operational losses of the Vanderloo Bank in the following way: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # draw random variables from a Poisson distribtion with lambda=lam prvs = poisson.rvs(lam, size=(10000)) # plot the pdf (loss frequency distribution) h = plt.hist(prvs, bins=range(0, 11)) plt.close("all") y = h[0]/np.sum(h[0]) x = h[1] plt.figure(figsize=(10, 6)) plt.bar(x[:-1], y, width=0.7, align='center', color="#2c97f1") plt.xlim([-1, 11]) plt.ylim([0, 0.25]) plt.ylabel("Probability", fontsize=12) plt.title("Loss Frequency Distribution", fontsize=14) plt.savefig("f01.png") revealing: 3. Loss Severity Distribution The data collected in the last column ofdf$allow us to plot and estimate the best fit of the loss severity distribution. In the practice of operational risk mangers, the lognormal distribution is a common choice: 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 c = .7, .7, .7 # define grey color plt.figure(figsize=(10, 6)) plt.hist(df["Loss ($M)"], bins=25, color=c, normed=True) plt.xlabel("Incurred Loss ($M)", fontsize=12) plt.ylabel("N", fontsize=12) plt.title("Loss Severity Distribution", fontsize=14) x = np.arange(0, 5, 0.01) sig, loc, scale = lognorm.fit(df["Loss ($M)"]) pdf = lognorm.pdf(x, sig, loc=loc, scale=scale) plt.plot(x, pdf, 'r') plt.savefig("f02.png")   print(sig, loc, scale) # lognormal pdf's parameters
0.661153638163 0.328566816132 0.647817560825

where the lognormal distribution probability density function (pdf) we use is given by:
$$p(x; \sigma, loc, scale) = \frac{1}{x\sigma\sqrt{2\pi}} \exp{ \left[ -\frac{1}{2} \left(\frac{\log{x}}{\sigma} \right)^2 \right] }$$
where $x = (y – loc)/scale$. The fit of pdf to the data returns:

4. Loss Distribution

The loss frequency distribution must be combined with the loss severity distribution for each risk type/business line combination in order to determine a loss distribution. The most common assumption here is that loss severity is independent of loss frequency. Hull (2015) suggests the following steps to be taken in building the Monte Carlo simulation leading to modelling of the loss distribution:

1. Sample from the frequency distribution to determine the number of loss events ($n$)
2. Sample $n$ times from the loss severity distribution to determine the loss experienced
for each loss event ($L_1, L_2, …, L_n$)
3. Determine the total loss experienced ($=L_1 + L_2 + … + L_n$)

When many simulation trials are used, we obtain a total distribution for losses of the type being considered. In Python we code those steps in the following way:

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 def loss(r, loc, sig, scale, lam): X = [] for x in range(11): # up to 10 loss events considered if(r < poisson.cdf(x, lam)): # x denotes a loss number out = 0 else: out = lognorm.rvs(s=sig, loc=loc, scale=scale) X.append(out) return np.sum(X) # = L_1 + L_2 + ... + L_n     # run 1e5 Monte Carlo simulations losses = [] for _ in range(100000): r = np.random.random() losses.append(loss(r, loc, sig, scale, lam))     h = plt.hist(losses, bins=range(0, 16)) _ = plt.close("all") y = h[0]/np.sum(h[0]) x = h[1]   plt.figure(figsize=(10, 6)) plt.bar(x[:-1], y, width=0.7, align='center', color="#ff5a19") plt.xlim([-1, 16]) plt.ylim([0, 0.20]) plt.title("Modelled Loss Distribution", fontsize=14) plt.xlabel("Loss ($M)", fontsize=12) plt.ylabel("Probability of Loss", fontsize=12) plt.savefig("f03.png") revealing: The function of loss has been designed in the way that it considers up to 10 loss events. We run$10^5$simulations. In each trial, first, we draw a random number r from a uniform distribution. If it is less than a value of Poisson cumulative distribution function (with$\lambda = 3.6$) for x loss number ($x = 0, 1, …, 10$) then we assume a zero loss incurred. Otherwise, we draw a rv from the lognormal distribution (given by its parameters found via fitting procedure a few lines earlier). Simple as that. The resultant loss distribution as shown in the chart above describes the expected severity of future losses (due to operational “fatal” activities of Vanderloo Bank) given by the corresponding probabilities. 5. Beyond Operational Risk Management A natural step of the numerical procedure which we have applied here seems to pertain to the modelling of, e.g., the anticipated (predicted) loss distribution for any portfolio of N-assets. One can estimate it based on the track record of losses incurred in trading as up-to-date. By doing so, we gain an additional tool in our arsenal of quantitative risk measures and modelling. Stay tuned as a new post will illustrate that case. DOWNLOAD vanderloo.h5 REFERENCES Hull, J., 2015, Risk Management and Financial Institutions, 4th Ed. ## Probability of Black Swan Events at NYSE Featured by a Russian website Financial One (Apr 26, 2016) The prediction of extreme rare events (EREs) in the financial markets remains one of the toughest problems. Firstly because of a very limited knowledge we have on their distribution and underlying correlations across the markets. Literally, we walk in dark, hoping it won’t happen today, not to the stocks we hold in our portfolios. But is that darkness really so dark? In this post we will use a textbook knowledge on classical statistics in order to analyze the universe of 2500+ stocks traded at New York Stock Exchange (NYSE) that experienced the most devastating daily loss in their history (1 data point per 1 stock). Based on that, we will try to shed a new light on the most mind-boggling questions: when the next ERE will happen, how long we need to wait for it, and if it hits what its magnitude “of devastation” will be? 1. Extreme Rare Events of NYSE An ERE meeting certain criteria has its own name in the financial world: a black swan. We will define it in the next Section. Historically speaking, at some point, people were convinced that only white swans existed until someone noticed a population of black swans in Western Australia. The impossible became possible. The same we observe in the stock markets. Black Monday on Oct 19, 1987 no doubt is the greatest historical event where the markets suffered from 20%+ losses. The event was so rare that any risk analyst could not predict it and/or protect against it. The history repeated itself in 2000 when the Internet bubble caused many IT stocks to lose a lot over the period of few weeks. An event of 9/11 in New York forced the closure of US stock exchange for 6 days, and on Sep 17, as a consequence of fear, about 25 stocks at NYSE lost more than 15% in one day. The uncertainty cannot be avoided but it can be minimized. Is it so? The financial crisis of 2007-2008 triggered a significant number of black swans to fly again and fly high… Did we learn enough from that? Do we protect more efficiently against black swans after 2009 market recovery? Will you be surprised if I tell you “no”?! Let’s conduct a data analysis of the most EREs ever spotted at NYSE among all its stocks. To do that, we will use Python language and Yahoo! Finance database. The quality of data are not required to be superbly accurate and we will download only adjusted-close prices with the corresponding dates. A list of companies (inter alia their tickers) traded at NYSE you can extract from a .xls file available for download at http://www.nasdaq.com/screening/company-list.aspx. It contains 3226 tickers, while, as we will see below, Yahoo! Finance recognizes only 2716 of them. Since 2716 is a huge data set meeting a lot of requirements for the statistical significance, I’m more than happy to go along with it. In the process of stock data download, we look for information on the most extreme daily loss per stock and collect them all in pandas’ DataFrame dfblack. It will contain a Date, the magnitude of ERE (Return), and Delta0. The latter we calculate on spot as a number of business days between some initial point in time (here, 1990-01-01) and the day of ERE occurrence. The processing time may take up to 45 min depending on your Internet speed. Be patient, it will be worth waiting that long. Using HDF5 we save dfblack DataFrame on the disk. # Probability of Black Swan Events at NYSE # (c) 2016 by Pawel Lachowicz, QuantAtRisk.com import numpy as np import pandas as pd import matplotlib.pyplot as plt import pandas_datareader.data as web import datetime as dt from scipy.stats import gumbel_l, t, norm, expon, poisson from math import factorial # Constants c = (.7, .7, .7) # grey color col = ['Date', 'Return', 'Delta0'] dfblack = pd.DataFrame(columns=col) # create an empty DataFrame i = n = 0 for lstline in open("nyseall.lst",'r').readlines(): ticker=lstline[:-1] i += 1 print(i, ticker) try: ok = True data = web.DataReader(str(ticker), data_source='yahoo', start='1900-01-01', end='2016-04-14') except: ok = False if(ok): n += 1 data['Return'] = data['Adj Close' ] / data['Adj Close'].shift(1) - 1 data = data[1:] # skip first row df = data['Return'].copy() dfs = df.sort_values() start = dt.date(1900, 1, 1) end = dt.date(dfs.index[0].year, dfs.index[0].month, dfs.index[0].day) delta0 = np.busday_count(start,end) tmin = end rmin = dfs[0] dfblack.loc[len(dfblack)]=[tmin, rmin, delta0] # add new row to dfblack else: print(' not found') dfblack.set_index(dfblack.Date, inplace=True) # set index by Date dfblack.sort_index(inplace=True) # sort DataFrame by Date del dfblack['Date'] # remove Date column print("No. of stocks with valid data: %g" % n) # saving to file store = pd.HDFStore('dfblack.h5') store['dfblack'] = dfblack store.close() # reading from file store = pd.HDFStore('dfblack.h5') dfblack = pd.read_hdf('dfblack.h5', 'dfblack') store.close() dfblack0 = dfblack.copy() # a backup copy where nyseall.lst is a plain text file listing all NYSE tickers: DDD MMM WBAI WUBA AHC ... The first instinct of a savvy risk analyst would be to plot the distribution of the magnitudes of EREs as extracted for the NYSE universe. We achieve that by: plt.figure(num=1, figsize=(9, 5)) plt.hist(dfblack.Return, bins=50, color='grey') # a histrogram plt.xlabel('Daily loss', fontsize = 12) plt.xticks(fontsize = 12) plt.savefig('fig01.png', format='png') what reveals: Peaked around -18% with a long left tail of losses reaching nearly -98%. Welcome to the devil’s playground! But, that’s just the beginning. The information on dates corresponding to the occurrence of those EREs allows us to build a 2D picture: # Black Monday data sample df_bm = dfblack[dfblack.index == dt.date(1987, 10, 19)] # Reopening of NYSE after 9/11 data sample df_wtc = dfblack[dfblack.index == dt.date(2001, 9, 17)] # Financial Crisis of 2008 data sample df_cre = dfblack[(dfblack.index >= dt.date(2008, 9, 19)) & (dfblack.index <= dt.date(2009, 3, 6))] # Drawdown of 2015/2016 data sample df_16 = dfblack[(dfblack.index >= dt.date(2015, 12, 1)) & (dfblack.index <= dt.date(2016, 2, 11))] plt.figure(figsize=(13, 6)) plt.plot(dfblack.Return*100, '.k') plt.plot(df_bm.Return*100, '.r') plt.plot(df_wtc.Return*100, '.b') plt.plot(df_cre.Return*100, '.m') plt.plot(df_16.Return*100, '.y') plt.xlabel('Time Interval [%s to %s]' % (dfblack.index[0], dfblack.index[-1]), fontsize=12) plt.ylabel('Magnitude [%]', fontsize=12) plt.title('Time Distribution of the Most Extreme Daily Losses of %g NYSE Stocks (1 point per stock)' % (len(dfblack)), fontsize=13) plt.xticks(fontsize = 12) plt.yticks(fontsize = 12) plt.xlim([dfblack.index[0], dt.date(2016, 4, 14)]) plt.savefig('fig02.png', format='png') delivering: Keeping in mind that every (out of 2716) stock is represented only by 1 data point, the figure reveals a dramatic state of the play: significantly lower number of EREs before 1995 and progressing increase after that with a surprisingly huge number in the past 5 years! As discussed in my previous post, Detecting Human Fear in Electronic Trading, the surge in EREs could be explained by a domination of algorithmic trading over manual trades. It may also suggest that the algorithmic risk management for many stocks is either not as good as we want it to be or very sensitive to the global stock buying/selling pressure across the markets (the butterfly effect). On the other side, the number of new stocks being introduced to NYSE may cause them “to fight for life” and those most vulnerable, short-lived, of unestablished position, reputation, financial history, or least liquidity can be the subject to ERE occurrence more likely. Regardless of the cause number one, there are more and more “sudden drops in the altitude” now than ever. I find it alarming. For clarity, in the above figure, the historical events of Black Monday 1978-10-19 (89 points), reopening of NYSE after 9/11 (25 points), Financial Crisis 2007-2008 (858 points), and a drawdown of 2015/2016 (256 points!) have been marked by red, blue, purple, and dark yellow colours, respectively. 2. Black Swans of NYSE The initial distribution of Extreme Rare Events of NYSE (Figure 1) provides us with an opportunity to define a black swan event. There is a statistical theory, Extreme Value Theory (EVT), delivering an appealing explanation on behaviour of rare events. We have discussed it in detail in Extreme VaR for Portfolio Managers and Black Swan and Extreme Loss Modeling articles. Thanks to that, we gain a knowledge and tools so needed to describe what we observe. As previously, in this post we will be only considering the Gumbel distribution$G$of the corresponding probability density function (pdf)$g$given as: $$G(z;\ a,b) = e^{-e^{-z}} \ \ \mbox{for}\ \ z=\frac{x-b}{a}, \ x\in\Re$$ and $$g(z;\ a,b) = b^{-1} e^{-z}e^{-e^{-z}} \ .$$ where$a$and$b$are the location parameter and scale parameter, respectively. You can convince yourself that fitting$g(z;\ a,b)$function truly works in practice: # fitting locG, scaleG = gumbel_l.fit(dfblack.Return) # location, scale parameters dx = 0.0001 x = [dx*i for i in range(-int(1/dx), int(1/dx)+1)] x2 = x.copy() plt.figure(num=3, figsize=(9, 4)) plt.hist(dfblack.Return, bins=50, normed=True, color=c, label="NYSE Data of 2716 EREs") pdf1 = gumbel_l.pdf(x2, loc=locG, scale=scaleG) y = pdf1.copy() plt.plot(x2, y, 'r', linewidth=2, label="Gumbel PDF fit") plt.xlim([-1, 0]) plt.xlabel('Daily loss [%]', fontsize = 12) plt.xticks(fontsize = 12) plt.legend(loc=2) plt.savefig('fig03.png', format='png') By deriving: print(locG, scaleG, locG-scaleG, locG+scaleG) # Pr(Loss < locG-scaleG) Pr_bs = gumbel_l.cdf(locG-scaleG, loc=locG, scale=scaleG) print(Pr_bs) we see that -0.151912998711 0.0931852781844 -0.245098276896 -0.058727720527 0.307799372445 i.e. the Gumbel pdf’s location parameter (peak) is at -15.2% with scale of 9.3%. Using an analogy to Normal distribution, we define two brackets, namely: $$a-b \ \ \mbox{and} \ \ a+b$$ to be -24.5% and -5.9%, respectively. Given that, we define the black swan event as a daily loss of: $$L < a-b$$ magnitude. Based on NYSE data sample we find that 30.8% of EREs (i.e. greater than 24.5% in loss) meet that criteria. We re-plot their time-magnitude history as follows: # skip data after 2015-04-10 (last 252 business days) dfb = dfblack0[dfblack0.index < dt.date(2015, 4, 10)] # re-fit Gumbel (to remain politically correct in-sample ;) locG, scaleG = gumbel_l.fit(dfb.Return) print(locG, locG-scaleG) # -0.16493276305 -0.257085820659 # extract black swans from EREs data set dfb = dfb[dfb.Return < locG-scaleG] dfb0 = dfb.copy() plt.figure(figsize=(13, 6)) plt.plot(dfb.Return*100, '.r') plt.xlabel('Time Interval [%s to %s]' % (dfb.index[0], dfb.index[-1]), fontsize=14) plt.ylabel('Black Swan Magnitude [%]', fontsize=14) plt.title('Time Distribution of Black Swans of %g NYSE Stocks (L < %.1f%%)' % (len(dfb), (locG-scaleG)*100), fontsize=14) plt.xticks(fontsize = 14) plt.yticks(fontsize = 14) plt.ylim([-100, 0]) plt.xlim([dfb.index[0], dt.date(2016, 4, 14)]) plt.savefig('fig04.png', format='png') where we deliberately skipped last 252+ data points to allow ourselves for some backtesting later on (out-of-sample). That reduces the number of black swans from 807 down to 627 (in-sample). 3. Modelling the Probability of an Event Based on Observed Occurrences Let’s be honest. It’s nearly impossible to predict in advance the exact day when next black swan will appear, for which stocks, will it be isolated or trigger some stocks to fall down on the same day? Therefore, what follows from now on, you should consider as a common sense in quantitative modelling of black swans backed by our knowledge of statistics. And yes, despite the opinions of Taleb (2010) in his book of The Black Swan: The Impact of the Highly Improbable we might derive some risk uncertainties and become more aware of the impact if the worst strikes. The classical guideline we find in Vose’s book Quantitative Risk Analysis: A Guide to Monte Carlo Simulation Modelling (1996). David delivers not only a solid explanations of many statistical distributions but also excellent hints on their practical usage for risk analysts. Since the discovery of this book in 2010 in Singapore, it is my favourite reference manual of all times. The modelling of the probability of an event based on observed occurrences may follow two distinct processes, analogous to the difference between discrete and continuous distributions: (1) an event that may occur only among a set of specific discrete opportunities, and (2) an event that may occur over a continuum of opportunity. Considering the black swan events we are not limited by a finite number of instances, therefore we will be taking into account the continuous exposure probability modelling. A continuous exposure process is characterized by the mean interval between events (MIBE)$\beta$. The underlying assumption refers to the uncertainty displaying the properties of a Poisson process, i.e. the probability of an event is independent of however many events have occurred in the past or how recently. And this is our starting point towards a better understanding of NYSE black swans. 3.1. Mean Interval Between Black Swan Events (MIBE) The MIBE is the average interval between$n$observed occurrences of a black swan event. Its true value can be estimated from the observed occurrences using central limit theorem: $$\mbox{MIBE}\beta = \mbox{Normal} \left( \bar{t}, \frac{\sigma}{\sqrt{n-1}} \right)$$ where$\bar{t}$is the average of the$n-1$observed intervals$t_i$between the$n$observed contiguous black swan events and$\sigma$is the standard deviation of the$t_i$intervals. The larger the value of$n$, the greater our confidence on knowing its true value. First, we estimate$\bar{t}$from the data. Using NumPy’s histogram function, extra, we confirm that a plain mean is the same as the expected value,$E(\Delta t)$, derived based on the histogram, $$E(\Delta t) = \sum_{i} p_i x_i \ ,$$ i.e.$\bar{t} = E(\Delta t)$where there is$p_i$probability of observing$x_i$. In Python we execute that check as follows: deltat = np.diff(dfb.Delta0) # time differences between points print('tbar = %.2f [days]' % np.mean(deltat)) print('min, max = %.2f, %.2f [days]' % (np.min(deltat), np.max(deltat))) print() # Confirm the mean using histogram data (mean = the expected value) y, x = np.histogram(deltat, bins=np.arange(np.ceil(np.max(deltat)+2)), density=True) hmean = np.sum(x[:-1]*y) print('E(dt) = %.2f [days]' % hmean) # plot historgam with a fitted Exp PDF plt.figure(figsize=(9, 5)) plt.bar(x[:-1], y, width=1, color=c) plt.xlim([0, 20]) plt.title('Time Differences between Black Swan Events', fontsize=12) plt.xlabel('Days', fontsize=12) plt.savefig('fig05.png', format='png') what returns tbar = 17.38 [days] min, max = 0.00, 950.00 [days] E(dt) = 17.38 [days] What is worth noticing is the fact that the largest recorded separation among historical black swans at NYSE was 950 days, i.e. 2.6 years. By plotting the distribution of time differences, we get: what could be read out more clearly if we type, e.g.: # the first 5 data points from the historgram print(y[0:5]) print(x[0:5]) returning [ 0.30990415 0.12300319 0.08466454 0.04472843 0.03354633] [ 0. 1. 2. 3. 4.] and means that our data suggest that there is a probability of 31.0% of observing the black swan events on the same day, 12.3% with a 1 day gap, 8.5% with a 2 day separation, and so on. Since MIBE$\beta$is not given by a single number, the recipe we have for its estimation (see the formula above) allows us solely for modelling of Mean Interval Between Black Swan Events as follows. First, we calculate the corresponding standard deviation: sig = np.std(deltat, ddof=1) # standard deviation print(tbar, sig) print(tbar, sig/np.sqrt(ne-1)) 15.8912 57.5940828519 15.8912 2.30192251209 and next ne = len(dfb) # 627, a number of black swans in data sample betaMIBE_sample = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=10000) plt.figure(figsize=(9, 5)) plt.hist(betaMIBE_sample, bins=25, color=c) plt.xlabel("MIBE beta [days]", fontsize=12) plt.title("Mean Interval Between Black Swan Events", fontsize=12) plt.savefig('fig06.png', format='png') 3.2. Time Till Next Black Swan Event Once we have estimated the MIBE$\beta$, it is possible to use its value in order to estimate the time till the next black swan event. It obeys $$\mbox{TTNE} = \mbox{Expon}(\beta)$$ i.e. the exponential distribution given by the probability density function of: $$f(x; \beta) = \lambda \exp(-\lambda x) = \frac{1}{\beta} \exp\left(-\frac{x}{\beta}\right) \ .$$ In Python we model TTNE in the following way: betaMIBE_one = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=1) # The time till next event = Expon(beta) ttne = expon.rvs(loc=0, scale=betaMIBE, size=100000) y, x = np.histogram(ttne, bins=np.arange(int(np.max(ttne))), density=True) plt.figure(figsize=(9, 5)) plt.bar(x[:-1], y, width=1, color=c, edgecolor=c) plt.xlim([0, 120]) plt.title("Time Till Next Black Swan Event", fontsize=12) plt.xlabel("Days", fontsize=12) plt.savefig('fig07.png', format='png') where by checking print(y[0:5]) print(x[0:5]) we get [0.05198052 0.0498905 0.04677047 0.04332043] [1 2 3 4] id est there is 5.20% of probability that the next black swan event at NYSE will occur on the next day, 4.99% in 2 days, 4.68% in 3 days, etc. 3.3. The Expected Number of Black Swans per Business Day The distribution of the number of black swan events to occur at NYSE per unit time (here, per business day) can be modelled using Poisson distribution: $$\mbox{Poisson}(\lambda) \ \ \mbox{where} \ \ \lambda = \frac{1}{\beta} \ .$$ The Poisson distribution models the number of occurrences of an event in a time$T$when the time between successive events follows a Poisson process. If$\beta$is the mean time between events, as used by the Exponential distribution, then$\lambda = T/\beta. We will use it in the “out-of-sample” backtesting in Section 4. Let’s turn words into action by executing the following Python code: # generate a set of Poisson rvs nep = poisson.rvs(1/betaMIBE_one, size=100000) y, x = np.histogram(nep, bins=np.arange(np.max(nep)+1), density=True) plt.figure(figsize=(9, 5)) plt.title("The Expected Number of Black Swans per Business Day", fontsize=12) plt.ylabel("N", fontsize=12) plt.ylabel("Probability", fontsize=12) plt.xticks(x) plt.xlim([-0.1, 2.1]) plt.bar(x[:-1], y, width=0.05, color='r', edgecolor='r', align='center') plt.savefig('fig08.png', format='png') The derived probabilities: print(y) print(x[:-1]) [0.95465 0.04419 0.00116] [0 1 2] inform us that there is 95.5% of chances that there will be no occurrence of the black swan event per business day, 4.4% that we will record only one, and 0.12% that we may find 2 out of 2500+ NYSE stocks that will suffer from the loss greater than 25.7% on the next business day (unit time). 3.4. Probability of the Occurrence of Several Events in an Interval The Poisson(1/\beta)$distribution calculates the distribution of the number of events that will occur in a single unit interval. The probability of exactly$k$black swan events in$m$trials is given by: $$\mbox{Pr}[X = N] = \frac{\lambda^k}{k!} \exp(-\lambda) = \frac{1}{\beta^k k!} \exp\left(-\frac{1}{\beta}\right)$$ with$\beta$rescaled to reflect the period of exposure, e.g.: $$\beta_{yr} = \beta/252$$ where 252 stands for a number of business days in a calendar year. Therefore, the number of black swan events in next business year can be modelled by: $$N = \mbox{Poisson}(\lambda) = \mbox{Poisson}(1/\beta_{yr}) = \mbox{Poisson}(252/\mbox{Normal}(\bar{t}, \sigma/\sqrt{n-1}))$$ what we can achieve in Python, coding: inNdays = 252 N = poisson.rvs(inNdays/norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1)), size=100000) exN = np.round(np.mean(N)) stdN = np.round(np.std(N, ddof=1)) y, x = np.histogram(N, bins=np.arange(int(np.max(N))), density=True) _ = plt.figure(figsize=(9, 5)) _ = plt.title("The Expected Number of Black Swans in next %g Business Days" % inNdays, fontsize=12) _ = plt.ylabel("N", fontsize=12) _ = plt.ylabel("Probability", fontsize=12) _ = plt.bar(x[:-1], y, width=1, color=c) plt.savefig('fig09.png', format='png') # probability that it will occur <N> events in next 'inNdays' days print('E(N) = %g' % exN) print('stdN = %g' % stdN) tmp = [(i,j) for i, j in zip(x,y) if i == exN] print('Pr(X=%g) = %.4f' % (exN, tmp[0][1])) tmp = [(i,j) for i, j in zip(x,y) if i == 0] print('Pr(X=%g) = %.4f' % (0, tmp[0][1])) Our Monte Carlo simulation may return, for instance: where the calculated measures are: E(N) = 16 stdN = 4 Pr(X=16) = 0.0992 Pr(X=0) = 0.0000 The interpretation is straightforward. On average, we expect to record$16\pm 4$black swan events in next 252 business days among 2500+ stocks traded at NYSE as for Apr 10, 2015 COB. The probability of observing exactly 16 black swans is 9.92%. It is natural, based on our data analysis, that the resultant probability of the “extreme luck” of not having any black swan at NYSE, Pr$(X=0)$, in the following trading year is zero. But, miracles happen! 9.92% is derived based on a single run of the simulation. Using the analytical formula for Pr$(X=N)$given above, we compute the uncertainty of $$\mbox{Pr}[X = (N=16)] = \frac{1}{(\beta_{yr})^{16} 16!} \exp\left(-\frac{1}{\beta_{yr}}\right) \ ,$$ Pr = [] for nsim in range(100000): betayr = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=1) / inNdays p = 1/(betayr**exN * factorial(exN)) * np.exp(-1/betayr) Pr.append(p) print('Pr[X = E(N) = %g] = %.2f +- %.2f' % (exN, np.mean(Pr), np.std(Pr, ddof=1))) that yields Pr[X = E(N) = 16] = 0.08 +- 0.02 and stays in and agreement with our result. 4. Prediction and Out-of-Sample Backtesting All right, time to verify a theoretical statistics in practice. Our goal is to analyze the sample of black swan data skipped so far but available to us. This procedure usually is known as an “out-of-sample” backtesting, i.e. we know “the future” as it has already happened but we pretend it is unknown. In first step, we look for a suitable time interval of exactly 252 business days “in the future”. It occurs to be: print(np.busday_count(dt.date(2015, 4, 10) , dt.date(2016,3, 29))) # 252 Next, we extract out-of-sample (oos) DataFrame and illustrate them in a similar fashion as previously for in-sample data: oos = dfblack0[(dfblack0.index >= dt.date(2015, 4, 10)) & (dfblack0.index <= dt.date(2016, 3, 29))] # extract black swans oos = oos[oos.Return < locG-scaleG] plt.figure(figsize=(13, 6)) plt.plot(oos.Return*100, '.b') plt.xlabel('Time Interval [%s to %s]' % (dfblack.index[0], oos.index[-1]), fontsize=14) plt.ylabel('Black Swan Magnitude [%]', fontsize=14) plt.title('Time Distribution of Out-of-Sample Black Swans of %g NYSE Stocks (L < %.1f%%)' % (len(oos), (locG-scaleG)*100), fontsize=14) plt.xticks(fontsize = 14) plt.yticks(fontsize = 14) plt.ylim([-100, 0]) plt.xlim([dfblack.index[0], oos.index[-1]]) plt.savefig('fig10.png', format='png') Based on that, we compare the MIBE as predicted with what the next 252 business days reveal: deltat_oos = np.diff(oos.Delta0) # time differences between points tbar_oos = np.mean(deltat_oos) print('MIBE (predicted) = %.2f +- %.2f [days]' % (tbar, sig/np.sqrt(ne-1))) print('MIBE out-of-sample = %.2f [days]' % tbar_oos) print() y, x = np.histogram(deltat_oos, bins=np.arange(np.ceil(np.max(deltat)+2)), density=True) print(y[0:5]) # [ 0.3814433 0.20618557 0.17525773 0.06185567 0.05154639] print(x[0:5]) # [ 0. 1. 2. 3. 4.] # Predicted probabilities were: # [ 0.30990415 0.12300319 0.08466454 0.04472843 0.03354633] The outcome surprises, i.e. MIBE (predicted) = 17.38 +- 2.30 [days] MIBE out-of-sample = 2.32 [days] clearly indicating that Apr 2015–Apr 2016 period was nearly 7.5 times more “active” than “all-time” NYSE black swan data would predict to be. Have I already said “it’s alarming”?! Well, at least scary enough for all who trade on the daily basis holding portfolios rich in NYSE assets. The probability of having black swan on the same day increased from 31% to 38%. A new, pure definition of sophisticated gambling. A verification of the time till next black swan event requires a gentle modification of oos DataFrame. Namely, we need to eliminate all data points with the same Delta0‘s: oos2 = oos.Delta0.drop_duplicates(inplace=False) tdiff = np.diff(oos2) y, x = np.histogram(tdiff, bins=np.arange(int(np.max(tdiff))), density=True) _ = plt.figure(figsize=(9, 5)) _ = plt.bar(x[:-1], y, width=1, color=c, edgecolor=c) _ = plt.xlim([1, 30]) _ = plt.title("Time Till Next Black Swan Event (Out-of-Sample)", fontsize=12) _ = plt.xlabel("Days", fontsize=12) _ = plt.ylabel("Probability", fontsize=12) plt.savefig('fig11.png', format='png') print(y[1:5]) print(x[1:5]) Again, from our modeling of Poisson($1/\beta$) we expected to have about 5% chances of waiting for the next black swan event a day, two, or three. Out-of-sample data suggest, [0.33898305 0.28813559 0.10169492 0.08474576] [1 2 3 4] that we are 6.3 times more likely to record black swan event at NYSE on the next business day. Terrifying. The calculation of the expected number of black swans per business day in out-of-sample data I’m leaving with you as a homework in Python programming. Our earlier prediction of those events in next 252 business days returned 16$\pm$4. As we already have seen it, in a considered time interval after Apr 10, 2015 we recorded 98 swans which is$6.1^{+2.1}_{-1.2}$times more than forecasted based on a broad sample. In next post of this series, we will try to develop a methodology for a quantitative derivation of the probability of black swan for$N$-asset portfolio. Reflection Investors invest in stocks with anticipation of earning profit. As we have witnessed above, the birds become a serious threat to flying high with a significant increase in this investment air-traffic. Your plane (stock) can be grounded and your airline (portfolio) can suffer from huge losses. Hoping is not the solution. The better ways of protecting against the unavoidable should be implemented for investments while your money are in the game. But what you should do when black swan strikes? Denzel Washington once said If you pray for rain, you gotta deal with the mud too. ## Predicting Heavy and Extreme Losses in Real-Time for Portfolio Holders (2) This part is awesome. Trust me! Previously, in Part 1, we examined two independent methods in order to estimate the probability of a very rare event (heavy or extreme loss) that an asset could experience on the next day. The first one was based on the computation of the tail probability, i.e.: given the upper threshold$L_{thr}$find the corresponding$\alpha$such that$ Pr(L < -L_{thr}) = \alpha$where a random variable$L$was referred to as a daily asset return (in percent). In this case, we showed that this approach was strongly dependent on the number of similar events in the past that would contribute to the changes of$\alpha$'s in time. The second method was based on a classical concept of the conditional probability. For example, given that the index (benchmark, etc.) drops by$L'$what is the probability that an asset will lose$L\le L'$? We found that this methods fails if the benchmark never experienced a loss of$L'$. In both cases, the lack of information on the occurrence of a rare event was the major problem. Therefore, is there any tool we can use to predict an event that has not happened yet?! This post addresses an appealing solution and delivers a black-box for every portfolio holder he or she can use as an integrated part of the risk management computer system. 1. Bayesian Conjugates Bayesian statistics treats the problem of finding the probability of an event a little bit differently than what we know from school. For example, if we flip a fair coin, in long run, we expect the same proportion of heads and tails. Formally, it is known as a frequentist interpretation. Bayesian interpretation assumes nothing about the coin (is it fair or weighted) and with every toss of the coin, it builds a grand picture on the underlying probability distribution. And it’s fascinating because we can start with an assumption that the coin is made such that we should observe a number of heads more frequently whereas, after$N$tosses, we may find that the coin is fair. Let me skip the theoretical part on that aspect of Bayesian statistics which has been beautifully explained by Mike Halls-Moore within his two articles (Bayesian Statistics: A Beginner’s Guide and Bayesian Inference Of A Binomial Proportion – The Analytical Approach) that I recommend you to study before reading this post further down. Looking along the return-series for any asset (spanned over$M$days), we may denote by$x_i = 1$($i=1,…,M$) a day when a rare event took place (e.g. an asset lost more than -16% or so) and by$x_i = 0$a lack of such event. Treating days as a series of Bernoulli trials with unknown probability$p$, if we start with an uninformed prior, such as Beta($\alpha, \beta$) with$\alpha=\beta=1$then the concept of Bayesian conjugates can be applied in order to update a prior belief. The update is given by: $$\alpha’ = \alpha + n_e = \alpha + \sum_{i=1}^{N\le M} x_i \\ \beta’ = \beta + N – n_e = \beta + N – \sum_{i=1}^{N\le M} x_i$$ what simply means that after$N$days from some initial moment in the past, by observing a rare event$n_e$times, the probability distribution is given by Beta($\alpha’, \beta’$). Therefore, a posterior predictive mean for Bernoulli likelihood can be interpreted as a new probability of an event to take place on the next day: $$\mbox{Pr}(L < -L_{thr}) = \frac{\alpha'}{\alpha' + \beta'}$$ And that's the key concept we are going to use now for the estimation of probability for heavy or extreme losses that an asset can experience in trading. As a quick visualisation, imagine you analyse the last 252 trading days of the AAPL stock. The stock never lost more than 90% in one day. No such extreme event took place. According to our new method, the probability that AAPL will plummet 90% (or more) in next trading session is not zero(!) but $$\mbox{Pr}(L < -90\%) = \frac{\alpha'}{\alpha' + \beta'} = \frac{1+0}{1+252-0} = 0.3953\%$$ with 95% confidence interval: $$\left[ B^{-1}(0.05, \alpha', \beta');\ B^{-1}(0.95, \alpha', \beta') \right]$$ where$B^{-1}$denotes the percent point function (quantile function) for Beta distribution. The estimation of a number of events corresponding to the loss of$L < -L_{thr}$during next 252 trading day, we find by: $$E(n_e) = \left\lceil \frac{252}{[B^{-1}(0.95, \alpha', \beta')]^{-1}} \right\rceil$$ where$\lceil \rceil$denotes the operation of rounding. 2. Analysis of Facebook since its IPO Having a tool that includes information whether an event took place or not is powerful. You may also notice that the more information we gather (i.e.$N\to\infty$) the more closer we are to the “true” probability function. It also means that the spread around a new posterior predictive mean: $$\sigma = \sqrt{ \frac{\alpha’ \beta’} { (\alpha’+\beta’)^2 (\alpha’+\beta’+1) } } \ \to 0 .$$ It is logical that the analysis of an asset traded daily for the past 19 years will deliver “more trustful” estimation of$\mbox{Pr}(L < -L_{thr})$rather than a situation where we take into account only a limited period of time. However, what if the stock is a newbie in the market? Let’s investigate the case of Facebook (NASDAQ:FB). The information we have the record of, spans a bit over 3.5 years back, when Facebook’s initial public offering (IPO) took place on May 18th, 2012. Since that day till today (December 6, 2015) the return-series (daily data) is$M=892$points long. Using the analytical formulations as given above and writing a Python code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 # Predicting Heavy and Extreme Losses in Real-Time for Portfolio Holders (2) # (c) 2015 Pawel Lachowicz, QuantAtRisk.com import numpy as np from scipy.stats import beta import matplotlib.pyplot as plt import pandas_datareader.data as web # Fetching Yahoo! Finance for Facebook data = web.DataReader("FB", data_source='yahoo', start='2012-05-18', end='2015-12-04')['Adj Close'] cp = np.array(data.values) # daily adj-close prices ret = cp[1:]/cp[:-1] - 1 # compute daily returns N = len(ret) # Plotting IBM price- and return-series plt.figure(num=2, figsize=(9, 6)) plt.subplot(2, 1, 1) plt.plot(cp) plt.axis("tight") plt.ylabel("FB Adj Close [USD]") plt.subplot(2, 1, 2) plt.plot(ret, color=(.6, .6, .6)) plt.axis("tight") plt.ylabel("Daily Returns") plt.xlabel("Time Period [days]") #plt.show() # provide a threshold Lthr = -0.21 # how many events of L < -21% occured? ne = np.sum(ret < Lthr) # of what magnitude? ev = ret[ret < Lthr] # avgloss = np.mean(ev) # if ev is non-empty array # prior alpha0 = beta0 = 1 # posterior alpha1 = alpha0 + ne beta1 = beta0 + N - ne pr = alpha1/(alpha1+beta1) cl1 = beta.ppf(0.05, alpha1, beta1) cl2 = beta.ppf(0.95, alpha1, beta1) ne252 = np.round(252/(1/cl2)) print("ne = %g" % ne) print("alpha', beta' = %g, %g" % (alpha1, beta1)) print("Pr(L < %3g%%) = %5.2f%%\t[%5.2f%%, %5.2f%%]" % (Lthr*100, pr*100, cl1*100, cl2*100)) print("E(ne) = %g" % ne252) first we visualise both the price- and return-series: and while the computations we find: ne = 0 alpha', beta' = 1, 893 Pr(L < -21%) = 0.11% [ 0.01%, 0.33%] E(ne) = 1 There is no record of an event within 892 days of FB trading,$n_e = 0$, that the stock lost 21% or more in a single day. However, the probability that such even may happen at the end of the next NASDAQ session is 0.11%. Non zero but nearly zero. A quick verification of the probability estimate returned by our Bayesian method we obtain by changing the value of a threshold from -0.21 to Lthr = 0. Then, we get: alpha', beta' = 422, 472 Pr(L < 0%) = 47.20% [44.46%, 49.95%] E(ne) = 126 i.e. 47.2% that on the next day the stock will close lower. It is in an excellent agreement with what can be calculated based on the return-series, i.e. there were 421 days with the negative daily returns, thus$421/892 = 0.4719$. Really cool, don’t you think?! Now, imagine for a second that after first few days since the Facebook IPO we recalculate the posterior predictive mean, day-by-day, eventually to reach the most current estimate of the probability that the stock may lose -21% or more at the end of the next trading day — such a procedure allows us to plot the change of$\mbox{Pr}(L < -L_{thr})$during the whole trading history of FB. We achieve that by running a modified version of the previous code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 import numpy as np from scipy.stats import beta import matplotlib.pyplot as plt import pandas_datareader.data as web data = web.DataReader("FB", data_source='yahoo', start='2012-05-18', end='2015-12-05')['Adj Close'] cp = np.array(data.values) # daily adj-close prices returns = cp[1:]/cp[:-1] - 1 # compute daily returns Lthr = -0.21 Pr = CL1 = CL2 = np.array([]) for i in range(2, len(returns)): # data window ret = returns[0:i] N = len(ret) ne = np.sum(ret < Lthr) # prior alpha0 = beta0 = 1 # posterior alpha1 = alpha0 + ne beta1 = beta0 + N - ne pr = alpha1/(alpha1+beta1) cl1 = beta.ppf(0.05, alpha1, beta1) cl2 = beta.ppf(0.95, alpha1, beta1) # Pr = np.concatenate([Pr, np.array([pr*100])]) CL1 = np.concatenate([CL1, np.array([cl1*100])]) CL2 = np.concatenate([CL2, np.array([cl2*100])]) plt.figure(num=1, figsize=(10, 5)) plt.plot(Pr, "k") plt.plot(CL1, "r--") plt.plot(CL2, "r--") plt.axis("tight") plt.ylim([0, 5]) plt.grid((True)) plt.show() what displays: i.e. after 892 days we reach the value of 0.11% as found earlier. The black line tracks the change of probability estimate while both red dashed lines denote the 95% confidence interval. The black line is smooth, i.e. because of a lack of -21% event in the past, a recalculated probability did not changed its value “drastically”. This is not the case when we consider the scenario for Lthr = -0.11, delivering: what stays in agreement with: ne = 1 alpha', beta' = 2, 892 Pr(L < -11%) = 0.22% [ 0.04%, 0.53%] E(ne) = 1 and confirms that only one daily loss of$L < -11\%$took place while the entire happy life of Facebook on NASDAQ so far. Since the computer can find the number of losses to be within a certain interval, that gives us an opportunity to estimate: $$\mbox{Pr}(L_1 \le L < L_2)$$ in the following way: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 import numpy as np from scipy.stats import beta import matplotlib.pyplot as plt import pandas_datareader.data as web data = web.DataReader("FB", data_source='yahoo', start='2012-05-18', end='2015-12-05')['Adj Close'] cp = np.array(data.values) # daily adj-close prices ret = cp[1:]/cp[:-1] - 1 # compute daily returns N = len(ret) dL = 2 for k in range(-50, 0, dL): ev = ret[(ret >= k/100) & (ret < (k/100+dL/100))] avgloss = 0 ne = np.sum((ret >= k/100) & (ret < (k/100+dL/100))) # # prior alpha0 = beta0 = 1 # posterior alpha1 = alpha0 + ne beta1 = beta0 + N - ne pr = alpha1/(alpha1+beta1) cl1 = beta.ppf(0.05, alpha1, beta1) cl2 = beta.ppf(0.95, alpha1, beta1) if(len(ev) > 0): avgloss = np.mean(ev) print("Pr(%3g%% < L < %3g%%) =\t%5.2f%%\t[%5.2f%%, %5.2f%%] ne = %4g" " E(L) = %.2f%%" % \ (k, k+dL, pr*100, cl1*100, cl2*100, ne, avgloss*100)) else: print("Pr(%3g%% < L < %3g%%) =\t%5.2f%%\t[%5.2f%%, %5.2f%%] ne = " \ "%4g" % (k, k+dL, pr*100, cl1*100, cl2*100, ne)) By doing so, we visualise the distribution of potential losses (that may occur on the next trading day) along with their magnitudes: Pr(-50% < L < -48%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-48% < L < -46%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-46% < L < -44%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-44% < L < -42%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-42% < L < -40%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-40% < L < -38%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-38% < L < -36%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-36% < L < -34%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-34% < L < -32%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-32% < L < -30%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-30% < L < -28%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-28% < L < -26%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-26% < L < -24%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-24% < L < -22%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-22% < L < -20%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-20% < L < -18%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-18% < L < -16%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-16% < L < -14%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-14% < L < -12%) = 0.11% [ 0.01%, 0.33%] ne = 0 Pr(-12% < L < -10%) = 0.34% [ 0.09%, 0.70%] ne = 2 E(L) = -11.34% Pr(-10% < L < -8%) = 0.67% [ 0.29%, 1.17%] ne = 5 E(L) = -8.82% Pr( -8% < L < -6%) = 0.89% [ 0.45%, 1.47%] ne = 7 E(L) = -6.43% Pr( -6% < L < -4%) = 2.91% [ 2.05%, 3.89%] ne = 25 E(L) = -4.72% Pr( -4% < L < -2%) = 9.84% [ 8.26%, 11.53%] ne = 87 E(L) = -2.85% Pr( -2% < L < 0%) = 33.11% [30.54%, 35.72%] ne = 295 E(L) = -0.90% For example, based on a complete trading history of FB, there is a probability of 0.34% that the stock will close with a loss to be between -12% and -10%, and if so, the “expected” loss would be -11.3% (estimated based solely on 2 events). Again. The probability, not certainty. ## Student t Distributed Linear Value-at-Risk One of the most underestimated feature of the financial asset distributions is their kurtosis. A rough approximation of the asset return distribution by the Normal distribution becomes often an evident exaggeration or misinterpretations of the facts. And we know that. The problem arises if we investigate a Value-at-Risk (VaR) measure. Within a standard approach, it is computed based on the analytical formula: $$\mbox{VaR}_{h, \alpha} = \Phi ^{-1} (1-\alpha)\sqrt{h}\sigma – h\mu$$ what expresses$(1-\alpha)h$-day VaR,$\Phi ^{-1} (1-\alpha)$denotes the standardised Normal distribution$(1-\alpha)$quantile, and$h$is a time horizon. The majority of stocks display significant excess kurtosis and negative skewness. In plain English that means that the distribution is peaked and likely to have a heavy left tail. In this scenario, the best fit of the Normal probability density function (pdf) to the asset return distribution underestimates the risk accumulated in a far negative territory. In this post we will introduce the concept of Student t Distributed Linear VaR, i.e. an alternative method to measure VaR by taking into account “a correction” for kurtosis of the asset return distribution. We will use the market stock data of IBM as an exemplary case study and investigate the difference in a standard and non-standard VaR calculation based on the parametric models. It will also be an excellent opportunity to learn how to do it in Python, quickly and effectively. 1. Leptokurtic Distributions and Student t VaR A leptokurtic distribution is one whose density function has a higher peak and greater mass in the tails than the normal density function of the same variance. In a symmetric unimodal distribution, i.e. one whose density function has only one peak, leptokurtosis is indicated by a positive excess kurtosis (Alexander 2008). The impact of leptokurtosis on VaR is non-negligible. Firstly, assuming high significance levels, e.g.$\alpha \le 0.01$, the estimation of VaR for Normal and leptokurtic distribution would return: $$\mbox{VaR}_{lepto,\ \alpha\le 0.01} \gt \mbox{VaR}_{Norm,\ \alpha\le 0.01}$$ which is fine, however for low significance levels the situation may change: $$\mbox{VaR}_{lepto,\ \alpha\ge 0.05} \lt \mbox{VaR}_{Norm,\ \alpha\ge 0.05} \ .$$ A good example of the leptokurtic distribution is Student t distribution. If it describes the data better and displays significant excess kurtosis, there is a good chance that VaR for high significance levels will be a much better measure of the tail risk. The Student t distribution is formally given by its pdf: $$f_\nu(x) = \sqrt{\nu\pi} \Gamma\left(\frac{\nu}{2}\right)^{-1} \Gamma\left(\frac{\nu+1}{2}\right) (1+\nu^{-1} x^2)^{-(\nu+1)/2}$$ where$\Gamma$denotes the gamma function and$\nu$the degrees of freedom. For$\nu\gt 2$its variance is$\nu(\nu-2)^{-1}$and the distribution has a finite excess kurtosis of$\kappa = 6(\nu-4)^{-1}$for$\nu \gt 4$. It is obvious that for$\nu\to \infty$the Student t pdf approaches the Normal pdf. Large values of$\nu$for the fitted Student t pdf in case of the financial asset daily return distribution are unlikely, unless the Normal distribution is the best pdf model indeed (possible to obtain for very large samples). We find$\alpha$quantile of the Student t distribution ($\mu=0$and$\sigma=1$) by the integration of the pdf and usually denote it as$\sqrt{\nu^{-1}(\nu-2)} t_{\nu}^{-1}(\alpha)$where$t_{\nu}^{-1}$is the percent point function of the t distribution. Accounting for the fact that in practice we deal with random variables of the form of$X = \mu + \sigma T$where$X$would describe the asset return given as a transformation of the standardised Student t random variable, we reach to the formal definition of the$(1-\alpha)h$-day Student t VaR given by: $$\mbox{Student}\ t\ \mbox{VaR}_{\nu, \alpha, h} = \sqrt{\nu^{-1}(\nu-2)h}\ t_{\nu}^{-1}(1-\alpha)\sigma – h\mu$$ Having that, we are ready to use this knowledge in practice and to understand, based on data analysis, the best regimes where one should (and should not) apply the Student t VaR as a more “correct” measure of risk. 2. Normal and Student t VaR for IBM Daily Returns Within the following case study we will make of use of Yahoo! Finance data provider in order to fetch the price-series of the IBM stock (adjusted close) and transform it into return-series. As the first task we have to verify before heading further down this road will be the sample skewness and kurtosis. We begin, as usual: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # Student t Distributed Linear Value-at-Risk # The case study of VaR at the high significance levels # (c) 2015 Pawel Lachowicz, QuantAtRisk.com import numpy as np import math from scipy.stats import skew, kurtosis, kurtosistest import matplotlib.pyplot as plt from scipy.stats import norm, t import pandas_datareader.data as web # Fetching Yahoo! Finance for IBM stock data data = web.DataReader("IBM", data_source='yahoo', start='2010-12-01', end='2015-12-01')['Adj Close'] cp = np.array(data.values) # daily adj-close prices ret = cp[1:]/cp[:-1] - 1 # compute daily returns # Plotting IBM price- and return-series plt.figure(num=2, figsize=(9, 6)) plt.subplot(2, 1, 1) plt.plot(cp) plt.axis("tight") plt.ylabel("IBM Adj Close [USD]") plt.subplot(2, 1, 2) plt.plot(ret, color=(.6, .6, .6)) plt.axis("tight") plt.ylabel("Daily Returns") plt.xlabel("Time Period 2010-12-01 to 2015-12-01 [days]") where we cover the most recent past 5 years of trading of IBM at NASDAQ. Both series visualised look like: where, in particular, the bottom one reveals five days where the stock experienced 5% or higher daily loss. In Python, a correct computation of the sample skewness and kurtosis is possible thanks to the scipy.stats module. As a test you may verify that for a huge sample (e.g. of 10 million) of random variables$X\sim N(0, 1)$, the expected skewness and kurtosis, from scipy.stats import skew, kurtosis X = np.random.randn(10000000) print(skew(X)) print(kurtosis(X, fisher=False)) is indeed equal 0 and 3, 0.0003890933008049 3.0010747070253507 respectively. As we will convince ourselves in a second, this is not the case for our IBM return sample. Continuing, 31 32 33 34 35 36 37 38 39 40 41 print("Skewness = %.2f" % skew(ret)) print("Kurtosis = %.2f" % kurtosis(ret, fisher=False)) # H_0: the null hypothesis that the kurtosis of the population from which the # sample was drawn is that of the normal distribution kurtosis = 3(n-1)/(n+1) _, pvalue = kurtosistest(ret) beta = 0.05 print("p-value = %.2f" % pvalue) if(pvalue < beta): print("Reject H_0 in favour of H_1 at %.5f level\n" % beta) else: print("Accept H_0 at %.5f level\n" % beta) we compute the moments and run the (excess) kurtosis significance test. As already explained in the comment, we are interested whether the derived value of kurtosis is significant at$\beta=0.05$level or not? We find that for IBM, the results reveal: Skewness = -0.70 Kurtosis = 8.39 p-value = 0.00 Reject H_0 in favour of H_1 at 0.05000 level id est, we reject the null hypothesis that the kurtosis of the population from which the sample was drawn is that of the normal distribution kurtosis in favour of the alternative hypothesis ($H_1$) that it is not. And if not then we have work to do! In the next step we fit the IBM daily returns data sample with both Normal pdf and Student t pdf in the following way: 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 # N(x; mu, sig) best fit (finding: mu, stdev) mu_norm, sig_norm = norm.fit(ret) dx = 0.0001 # resolution x = np.arange(-0.1, 0.1, dx) pdf = norm.pdf(x, mu_norm, sig_norm) print("Integral norm.pdf(x; mu_norm, sig_norm) dx = %.2f" % (np.sum(pdf*dx))) print("Sample mean = %.5f" % mu_norm) print("Sample stdev = %.5f" % sig_norm) print() # Student t best fit (finding: nu) parm = t.fit(ret) nu, mu_t, sig_t = parm pdf2 = t.pdf(x, nu, mu_t, sig_t) print("Integral t.pdf(x; mu, sig) dx = %.2f" % (np.sum(pdf2*dx))) print("nu = %.2f" % nu) print() where we ensure that the complete integral over both pdfs return 1, as expected: Integral norm.pdf(x; mu_norm, sig_norm) dx = 1.00 Sample mean = 0.00014 Sample stdev = 0.01205 Integral t.pdf(x; mu, sig) dx = 1.00 nu = 3.66 The best fit of the Student t pdf returns the estimate of 3.66 degrees of freedom. From this point, the last step is the derivation of two distinct VaR measure, the first one corresponding to the best fit of the Normal pdf and the second one to the best fit of the Student t pdf, respectively. The trick is 61 62 63 64 65 66 67 68 69 # Compute VaR h = 1 # days alpha = 0.01 # significance level StudenthVaR = (h*(nu-2)/nu)**0.5 * t.ppf(1-alpha, nu)*sig_norm - h*mu_norm NormalhVaR = norm.ppf(1-alpha)*sig_norm - mu_norm lev = 100*(1-alpha) print("%g%% %g-day Student t VaR = %.2f%%" % (lev, h, StudenthVaR*100)) print("%g%% %g-day Normal VaR = %.2f%%" % (lev, h, NormalhVaR*100)) that in the calculation of the$(1-\alpha)1$-day Student t VaR ($h=1$) according to: $$\mbox{Student}\ t\ \mbox{VaR}_{\nu, \alpha=0.01, h=1} = \sqrt{\nu^{-1}(\nu-2)}\ t_{\nu}^{-1}(0.99)\sigma – \mu$$ we have to take$\sigma$and$\mu$given by sig_norm and mu_norm variable (in the code) and not the ones returned by the t.fit(ret) function in line #54. We find out that: 99% 1-day Student t VaR = 3.19% 99% 1-day Normal VaR = 2.79% i.e. VaR derived based on the Normal pdf best fit may underestimate the tail risk for IBM, as suspected in the presence of highly significant excess kurtosis. Since the beauty of the code is aesthetic, the picture is romantic. We visualise the results of our investigation, extending our code with: 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 plt.figure(num=1, figsize=(11, 6)) grey = .77, .77, .77 # main figure plt.hist(ret, bins=50, normed=True, color=grey, edgecolor='none') plt.hold(True) plt.axis("tight") plt.plot(x, pdf, 'b', label="Normal PDF fit") plt.hold(True) plt.axis("tight") plt.plot(x, pdf2, 'g', label="Student t PDF fit") plt.xlim([-0.2, 0.1]) plt.ylim([0, 50]) plt.legend(loc="best") plt.xlabel("Daily Returns of IBM") plt.ylabel("Normalised Return Distribution") # inset a = plt.axes([.22, .35, .3, .4]) plt.hist(ret, bins=50, normed=True, color=grey, edgecolor='none') plt.hold(True) plt.plot(x, pdf, 'b') plt.hold(True) plt.plot(x, pdf2, 'g') plt.hold(True) # Student VaR line plt.plot([-StudenthVaR, -StudenthVaR], [0, 3], c='g') # Normal VaR line plt.plot([-NormalhVaR, -NormalhVaR], [0, 4], c='b') plt.text(-NormalhVaR-0.01, 4.1, "Norm VaR", color='b') plt.text(-StudenthVaR-0.0171, 3.1, "Student t VaR", color='g') plt.xlim([-0.07, -0.02]) plt.ylim([0, 5]) plt.show() what brings us to: where in the inset, we zoomed in the left tail of the distribution marking both VaR results. Now, it is much clearly visible why at$\alpha=0.01$the Student t VaR is greater: the effect of leptokurtic distribution. Moreover, the fit of the Student t pdf appears to be a far better parametric model describing the central mass (density) of IBM daily returns. There is an ongoing debate which one out of these two risk measures should be formally applied when it comes to reporting of 1-day VaR. I do hope that just by presenting the abovementioned results I made you more aware that: (1) it is better to have a closer look at the excess kurtosis for your favourite asset (portfolio) return distribution, and (2) normal is boring. Someone wise once said: “Don’t be afraid of being different. Be afraid of being the same as everyone else.” The same means normal. Dare to fit non-Normal distributions! DOWNLOAD tStudentVaR.py REFERENCES Alexander, C., 2008, Market Risk Analysis. Volume IV., John Wiley & Sons Ltd ## Predicting Heavy and Extreme Losses in Real-Time for Portfolio Holders (1) The probability of improbable events. The simplicity amongst complexity. The purity in its best form. The ultimate cure for those who trade, for those who invest. Does it exist? Can we compute it? Is it really something impossible? In this post we challenge ourselves to the frontiers of accessible statistics and data analysis in order to find most optimal computable solutions to this enigmatic but mind-draining problem. Since there is no certainty, everything remains shrouded in the veil of probability. The probability we have to face and hope that something unlikely, determined to take place anyhow, eventually will not happen. Our goal is to calculate the probability of a very rare event (e.g. a heavy and/or extreme loss) in the trading market (e.g. of a stock plummeting 5% or much more) in a specified time-horizon (e.g. on the next day, in one week, in one month, etc.). The probability. Not the certainty of that event. In this Part 1, first, we look at the tail of an asset return distribution and compress our knowledge on Value-at-Risk (VaR) to extract the essence required to understand why VaR-stuff is not the best card in our deck. Next, we move to a classical Bayes’ theorem which helps us to derive a conditional probability of a rare event given… yep, another event that (hypothetically) will take place. Eventually, in Part 2, we will hit the bull between its eyes with an advanced concept taken from the Bayesian approach to statistics and map, in real-time, for any return-series its loss probabilities. Again, the probabilities, not certainties. 1. VaR (not) for Rare Events In the framework of VaR we take into consideration$T$days of trading history of an asset. Next, we drive a number (VaR) that describes a loss that is likely to take place with the probability of approximately$\alpha$. “To take place” does not mean here that it will take place. In this approach we try to provide some likelihood (a quantitative measure) of the rare event in a specified time-horizon (e.g. on the next day if daily return-series are under our VaR investigation; a scenario considered in this post). If by$L$we denote a loss (in percent) an asset can experience on the next day, then: $$\mbox{Pr}(L \le -\mbox{VaR}_{1-\alpha}) = \alpha$$ would be the probability of a loss of$-L\times D$dollars where$-L\times D\ge -\mbox{VaR}\times D$, equal, for instance,$\alpha=0.05$(also referred to as$(1-\alpha)$% VaR measure;$\mbox{VaR}_{95}$, etc.) and$D$is the position size (money invested in the asset in terms of physical currency, e.g. in dollars). In other words, the historical data can help us to find$\mbox{VaR}_{95}$given$\alpha$assuming 5% of chances that$\mbox{VaR}_{1-\alpha}$will be exceeded on the next day. In order to illustrate that case and its shortcomings when it comes to the analysis of rare events, let’s look at the 10 year trading history of two stocks in the NASDAQ market: highly volatile CAAS (China Automotive Systems, Inc.; of the market capital of 247M) and highly liquid AAPL (Apple Inc.; of the market capital of 750B). First, we fetch their adjusted close price-series from Yahoo! Finance and derive the corresponding daily return-series utilising Python: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 # Predicting Heavy and Extreme Losses in Real-Time for Portfolio Holders # (c) 2015 QuantAtRisk.com, by Pawel Lachowicz # # heavy1.py import pandas.io.data as web import matplotlib.pyplot as plt import numpy as np from pyvar import findvar, findalpha # ---1. Data Processing # fetch and download daily adjusted-close price series for CAAS # and AAPL stocks using Yahoo! Finance public data provider caas = web.DataReader("CAAS", data_source='yahoo', start='2005-05-13', end='2015-05-13')['Adj Close'] aapl = web.DataReader("AAPL", data_source='yahoo', start='2005-05-13', end='2015-05-13')['Adj Close'] CAAScp = np.array(caas.values) AAPLcp = np.array(aapl.values) f = file("data1.dat","wb") np.save(f, CAAScp) np.save(f, AAPLcp) f.close() # read in the data from a file f = file("data1.dat","rb") CAAScp = np.load(f) AAPLcp = np.load(f) f.close() # compute return-series retCAAS = CAAScp[1:]/CAAScp[:-1]-1 retAAPL = AAPLcp[1:]/AAPLcp[:-1]-1 The best way to understand the data is by plotting them: 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 # plotting (figure #1) # adjusted-close price-series fig, ax1 = plt.subplots(figsize=(10, 6)) plt.xlabel("Trading days 13/05/2005-13/05/2015") plt.plot(CAAScp, '-r', label="CAAS") plt.axis("tight") plt.legend(loc=(0.02, 0.8)) plt.ylabel("CAAS Adj Close Price (US$)") ax2 = ax1.twinx() plt.plot(AAPLcp, '-', label="AAPL") plt.legend(loc=(0.02, 0.9)) plt.axis("tight") plt.ylabel("AAPL Adj Close Price (US$)") # plotting (figure #2) # daily return-series plt.figure(num=2, figsize=(10, 6)) plt.subplot(211) plt.grid(True) plt.plot(retCAAS, '-r', label="CAAS") plt.axis("tight") plt.ylim([-0.25,0.5]) plt.legend(loc="upper right") plt.ylabel("CAAS daily returns") plt.subplot(212) plt.grid(True) plt.plot(retAAPL, '-', label="AAPL") plt.legend(loc="upper right") plt.axis("tight") plt.ylim([-0.25,0.5]) plt.ylabel("AAPL daily returns") plt.xlabel("Trading days 13/05/2005-13/05/2015") We obtain the price-series and return-series respectively. For the latter plot, by fixing the scaling of both$y$-axes we immediately gain an chance to inspect the number of daily trades closing with heavy losses. Well, at least at first glance and for both directions of trading. In this post we will be considering the long positions only. Having our data pre-processed we may implement two different strategies to make use of the VaR framework in order to work out the probabilities for tail events. The first one is based on setting$\alpha$level and finding$-VaR_{1-\alpha}$. Let’s assume$\alpha=0.01$, then the following piece of code 73 74 75 76 77 78 79 80 # ---2. Compuation of VaR given alpha alpha = 0.01 VaR_CAAS, bardata1 = findvar(retCAAS, alpha=alpha, nbins=200) VaR_AAPL, bardata2 = findvar(retAAPL, alpha=alpha, nbins=100) cl = 100.*(1-alpha) aims at computation of the corresponding numbers making use of the function: def findvar(ret, alpha=0.05, nbins=100): # Function computes the empirical Value-at-Risk (VaR) for return-series # (ret) defined as NumPy 1D array, given alpha # (c) 2015 QuantAtRisk.com, by Pawel Lachowicz # # compute a normalised histogram (\int H(x)dx = 1) # nbins: number of bins used (recommended nbins>50) hist, bins = np.histogram(ret, bins=nbins, density=True) wd = np.diff(bins) # cumulative sum from -inf to +inf cumsum = np.cumsum(hist * wd) # find an area of H(x) for computing VaR crit = cumsum[cumsum <= alpha] n = len(crit) # (1-alpha)VaR VaR = bins[n] # supplementary data of the bar plot bardata = hist, n, wd return VaR, bardata Here, we create the histogram with a specified number of bins and$-VaR_{1-\alpha}$is found in an empirical manner. For many reasons this approach is much better than fitting the Normal Distribution to the data and finding VaR based on the integration of that continuous function. It is a well-know fact that such function would underestimate the probabilities in far tail of the return distribution. And the game is all about capturing what is going out there, right? The results of computation we display as follows: 82 83 print("%g%% VaR (CAAS) = %.2f%%" % (cl, VaR_CAAS*100.)) print("%g%% VaR (AAPL) = %.2f%%\n" % (cl, VaR_AAPL*100.)) i.e. 99% VaR (CAAS) = -9.19% 99% VaR (AAPL) = -5.83% In order to gain a good feeling of those numbers we display the left-tails of both return distributions 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 # plotting (figure #3) # histograms of daily returns; H(x) # plt.figure(num=3, figsize=(10, 6)) c = (.7,.7,.7) # grey color (RGB) # # CAAS ax = plt.subplot(211) hist1, bins1 = np.histogram(retCAAS, bins=200, density=False) widths = np.diff(bins1) b = plt.bar(bins1[:-1], hist1, widths, color=c, edgecolor="k", label="CAAS") plt.legend(loc=(0.02, 0.8)) # # mark in red all histogram values where int_{-infty}^{VaR} H(x)dx = alpha hn, nb, _ = bardata1 for i in range(nb): b[i].set_color('r') b[i].set_edgecolor('k') plt.text(-0.225, 30, "VaR$_{%.0f}$(CAAS) = %.2f%%" % (cl, VaR_CAAS*100.)) plt.xlim([-0.25, 0]) plt.ylim([0, 50]) # # AAPL ax2 = plt.subplot(212) hist2, bins2 = np.histogram(retAAPL, bins=100, density=False) widths = np.diff(bins2) b = plt.bar(bins2[:-1], hist2, widths, color=c, edgecolor="k", label="AAPL") plt.legend(loc=(0.02, 0.8)) # # mark in red all histogram bars where int_{-infty}^{VaR} H(x)dx = alpha hn, nb, wd = bardata2 for i in range(nb): b[i].set_color('r') b[i].set_edgecolor('k') plt.text(-0.225, 30, "VaR$_{%.0f}$(AAPL) = %.2f%%" % (cl, VaR_AAPL*100.)) plt.xlim([-0.25, 0]) plt.ylim([0, 50]) plt.xlabel("Stock Returns (left tail only)") plt.show() where we mark our$\alpha=$1% regions in red: As you may notice, so far, we haven’t done much new. A classical textbook example coded in Python. However, the last figure reveals the main players of the game. For instance, there is only 1 event of a daily loss larger than 15% for AAPL while CAAS experienced 4 heavy losses. Much higher 99% VaR for CAAS takes into account those 4 historical events and put more weight on 1-day VaR as estimated in our calculation. Imagine now that we monitor all those tail extreme/rare events day by day. It’s not too difficult to notice (based on the inspection of Figure #2) that in case of AAPL, the stock recorded its first serious loss of$L \lt -10$% approximately 650 days since the beginning of our “monitoring” which commenced on May 13, 2005. In contrast, CAAS was much more volatile and you needed to wait only ca. 100 days to record the loss of the same magnitude. If something did not happen, e.g.$L \lt -10$%, the VaR-like measure is highly inadequate measure of probabilities for rare events. Once the event took place, the estimation of VaR changes (is updated) but decreases in time until a new rare event occurs. Let’s illustrate it with Python. This is our second VaR strategy: finding$\alpha$given the threshold for rare events. We write a simple function that does the job for us: def findalpha(ret, thr=1, nbins=100): # Function computes the probablity P(X<thr)=alpha given threshold # level (thr) and return-series (NumPy 1D array). X denotes the # returns as a rv and nbins is number of bins used for histogram # (c) 2015 QuantAtRisk.com, by Pawel Lachowicz # # compute normalised histogram (\int H(x)dx=1) hist, bins = np.histogram(ret, bins=nbins, density=True) # compute a default histogram hist1, bins1 = np.histogram(ret, bins=nbins, density=False) wd = np.diff(bins1) x = np.where(bins1 < thr) y = np.where(hist1 != 0) z = list(set(x[0]).intersection(set(y[0]))) crit = np.cumsum(hist[z]*wd[z]) # find alpha try: alpha = crit[-1] except Exception as e: alpha = 0 # count number of events falling into (-inft, thr] intervals nevents = np.sum(hist1[z]) return alpha, nevents We call it in our main program: 126 127 128 129 130 131 132 133 134 135 136 137 138 # ---3. Computation of alpha, given the threshold for rare events thr = -0.10 alpha1, ne1 = findalpha(retCAAS, thr=thr, nbins=200) alpha2, ne2 = findalpha(retAAPL, thr=thr, nbins=200) print("CAAS:") print(" Pr( L < %.2f%% ) = %.2f%%" % (thr*100., alpha1*100.)) print(" %g historical event(s)" % ne1) print("AAPL:") print(" Pr( L < %.2f%% ) = %.2f%%" % (thr*100., alpha2*100.)) print(" %g historical event(s)" % ne2) what returns the following results: CAAS: Pr( L < -10.00% ) = 0.76% 19 historical event(s) AAPL: Pr( L < -10.00% ) = 0.12% 3 historical event(s) And that’s great however all these numbers are given as a summary of 10 years of analysed data (May 13/2005 to May/13 2015 in our case). The final picture could be better understood if we could dynamically track in time the changes of both probabilities and the number of rare events. We achieve it by the following not-state-of-the-art code: 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 # ---4. Mapping alphas for Rare Events alphas = [] nevents = [] for t in range(1,len(retCAAS)-1): data = retCAAS[0:t] alpha, ne = findalpha(data, thr=thr, nbins=500) alphas.append(alpha) nevents.append(ne) alphas2 = [] nevents2 = [] for t in range(1,len(retAAPL)-1): data = retAAPL[0:t] alpha, ne = findalpha(data, thr=thr, nbins=500) alphas2.append(alpha) nevents2.append(ne) # plotting (figure #4) # running probability for rare events # plt.figure(num=4, figsize=(10, 6)) ax1 = plt.subplot(211) plt.plot(np.array(alphas)*100., 'r') plt.plot(np.array(alphas2)*100.) plt.ylabel("Pr( L < %.2f%% ) [%%]" % (thr*100.)) plt.axis('tight') ax2 = plt.subplot(212) plt.plot(np.array(nevents), 'r', label="CAAS") plt.plot(np.array(nevents2), label="AAPL") plt.ylabel("# of Events with L < %.2f%%" % (thr*100.)) plt.axis('tight') plt.legend(loc="upper left") plt.xlabel("Trading days 13/05/2005-13/05/2015") plt.show() revealing the following picture: It’s a great way of looking at and understanding the far left-tail volatility of the asset under your current investigation. The probability between two rare/extreme events decreases for the obvious reason: along the time axis we include more and more data therefore the return distribution evolves and shifts its mass to the right leaving left tail events less and less probable. The question remains: if an asset has never experienced an extreme loss of given magnitute, the probability of such event, within our VaR framework, simply remains zero! For example, for losses$L \lt -20$%, CAAS displays 2 extreme events while AAPL none! Therefore, our estimation of superbly rare daily loss of -20% (or more) for AAPL stock based on 10 years of data is zero or undefined or… completely unsound. Can we do better than this? 2. Classical Conditional Prediction for Rare Events Let’s consider a case where we want to predict the probability of the asset/stock (CAAS; traded at NASDAQ exchange) falling down more than$-L$%. Previously we have achieved such estimation through the integration of its probability density function. An alternative way is via derivation of the conditional probability, i.e. that CAAS will lose more than$-L$% given that NASDAQ index drops more than$-L$% on the next day. Formula? Well, Bayes’ formula. That all what we need: $$\mbox{Pr}(B|R) = \frac{ \mbox{Pr}(R|B)\ \mbox{Pr}(B) } { \mbox{Pr}(R) } .$$ Great, now, the meaning of each term. Let$A$denotes the event of the stock daily return to be between -L% and 0%. Let B is the event of the stock return to be$\lt -L$%. Let$R$is the event of NASDAQ daily return to be$\lt -L$%. Given that, based on both CAAS and NASDAQ historical time-series, we are able to compute$\mbox{Pr}(A)$,$\mbox{Pr}(B)$,$\mbox{Pr}(R|A)$,$\mbox{Pr}(R|B)$, therefore $$\mbox{Pr}(R) = \mbox{Pr}(R|A)\mbox{Pr}(A) + \mbox{Pr}(R|B)\mbox{Pr}(B)$$ as well. Here,$\mbox{Pr}(R|A)$would stand for the probability of NASDAQ falling down more than$-L$% given the observation of the CAAS return to be in$(-L;0)$% interval and$\mbox{Pr}(R|B)$would denote the probability of NASDAQ falling down more than$-L$% given the observation of the CAAS return also dropping more than$-L$% on the same day. With Bayes’ formula we inverse the engineering, aiming at providing the answer on$\mbox{Pr}(B|R)$given all available data. Ready to code it? Awesome. Here we go: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 # Predicting Heavy and Extreme Losses in Real-Time for Portfolio Holders # (c) 2015 QuantAtRisk.com, by Pawel Lachowicz # # heavy2.py import pandas.io.data as web import matplotlib.pyplot as plt import numpy as np # ---1. Data Processing # fetch and download daily adjusted-close price series for CAAS stock # and NASDAQ index using Yahoo! Finance public data provider ''' caas = web.DataReader("CAAS", data_source='yahoo', start='2005-05-13', end='2015-05-13')['Adj Close'] nasdaq = web.DataReader("^IXIC", data_source='yahoo', start='2005-05-13', end='2015-05-13')['Adj Close'] CAAScp = np.array(caas.values) NASDAQcp = np.array(nasdaq.values) f = file("data2.dat","wb") np.save(f,CAAScp) np.save(f,NASDAQcp) f.close() ''' f = file("data2.dat","rb") CAAScp = np.load(f) NASDAQcp = np.load(f) f.close() # compute the return-series retCAAS = CAAScp[1:]/CAAScp[:-1]-1 retNASDAQ = NASDAQcp[1:]/NASDAQcp[:-1]-1 The same code as used in Section 1 but instead of AAPL data, we fetch NASDAQ index daily close prices. Let’s plot the return-series: 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 # plotting (figure #1) # return-series for CAAS and NASDAQ index # plt.figure(num=2, figsize=(10, 6)) plt.subplot(211) plt.grid(True) plt.plot(retCAAS, '-r', label="CAAS") plt.axis("tight") plt.ylim([-0.25,0.5]) plt.legend(loc="upper right") plt.ylabel("CAAS daily returns") plt.subplot(212) plt.grid(True) plt.plot(retNASDAQ, '-', label="NASDAQ") plt.legend(loc="upper right") plt.axis("tight") plt.ylim([-0.10,0.15]) plt.ylabel("NASDAQ daily returns") plt.xlabel("Trading days 13/05/2005-13/05/2015") plt.show() i.e., where we observe a different trading dynamics for NASDAQ index between 750th and 1000th day (as counted from May/13, 2005). Interestingly, the heaviest losses of NASDAQ are not ideally correlated with those of CAAS. That make this case study more exciting! Please also note that CAAS is not the part of the NASDAQ index (i.e., its component). Bayes’ formula is designed around independent events and, within a fair approximation, we may think of CAAS as an asset ticking off that box here. What remains and takes a lot of caution is the code that “looks at” the data in a desired way. Here is its final form: 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 # ---2. Computations of Conditional Probabilities for Rare Events # isolate return-series displaying negative returns solely # set 1 for time stamps corresponding to positive returns nretCAAS = np.where(retCAAS < 0, retCAAS, 1) nretNASDAQ = np.where(retNASDAQ < 0, retNASDAQ, 1) # set threshold for rare events thr = -0.065 # compute the sets of events A = np.where(nretCAAS < 0, nretCAAS, 1) A = np.where(A >= thr, A, 1) B = np.where(nretCAAS < thr, retCAAS, 1) R = np.where(nretNASDAQ < thr, retNASDAQ, 1) nA = float(len(A[A != 1])) nB = float(len(B[B != 1])) n = float(len(nretCAAS[nretCAAS != 1])) # n must equal to nA + nB # (optional) print(nA, nB, n == (nA + nB)) # check, if True then proceed further print(len(A), len(B), len(R)) print # compute the probabilities pA = nA/n pB = nB/n # compute the conditional probabilities pRA = np.sum(np.where(R+A < 0, 1, 0))/n pRB = np.sum(np.where(R+B < 0, 1, 0))/n pR = pRA*pA + pRB*pB # display results print("Pr(A)\t = %5.5f%%" % (pA*100.)) print("Pr(B)\t = %5.5f%%" % (pB*100.)) print("Pr(R|A)\t = %5.5f%%" % (pRA*100.)) print("Pr(R|B)\t = %5.5f%%" % (pRB*100.)) print("Pr(R)\t = %5.5f%%" % (pR*100.)) if(pR>0): pBR = pRB*pB/pR print("\nPr(B|R)\t = %5.5f%%" % (pBR*100.)) else: print("\nPr(B|R) impossible to be determined. Pr(R)=0.") Python’s NumPy library helps us in a tremendous way by its smartly designed function of where which we employ in lines #69-72 and #86-87. First we test a logical condition, if it’s evaluated to True we grab the right data (actual returns), else we return 1. That opens for us a couple of shortcuts in finding the number of specific events and making sure we are still on the right side of the force (lines #73-79). As you can see, in line #66 we specified our threshold level of$L = -6.5$%. Given that, we derive the probabilities: (1216.0, 88.0, True) (2516, 2516, 2516) Pr(A) = 93.25153% Pr(B) = 6.74847% Pr(R|A) = 0.07669% Pr(R|B) = 0.15337% Pr(R) = 0.08186% Pr(B|R) = 12.64368% The level of -6.5% is pretty random but delivers an interesting founding. Namely, based on 10 years of data there is 12.6% of chances that on May/14 2015 CAAS will lose more than -6.5% if NASDAQ drops by the same amount. How much exactly? We don’t know. It’s not certain. Only probable in 12.6%. The outcome of$\mbox{Pr}(R|B)$which is close to zero may support our assumption that CAAS has a negligible “influence” on NASDAQ itself. On the other side of the rainbow, it’s much more difficult to interpret$\mbox{Pr}(R)$, the probability of an “isolated” rare event (i.e.,$L<-6.5$%) since its estimation is solely based on two players in the game: CAAS and NASDAQ. However, when computed for$N\ge 100$stocks outside the NASDAQ index but traded at NASDAQ, such distribution of$\mbox{Pr}(R)$'s would be of great value as an another alternative estimator for rare events (I should write about it a distinct post). Now, back the business. There is one problem with our conditional estimation of rare event probabilities as outlined within this Section. A quick check for$L = -10$% reveals: Pr(A) = 98.69632% Pr(B) = 1.30368% Pr(R|A) = 0.00000% Pr(R|B) = 0.00000% Pr(R) = 0.00000% Pr(B|R) impossible to be determined. Pr(R)=0. Let’s remind that$R$is the event of NASDAQ daily return to be$\lt -L$%. A quick look at NASDAQ return-series tells us the whole story in order to address the following questions that come to your mind: why, why, why zero? Well, simply because there was no event in past 10 year history of the index that it slid more than -10%. And we are cooked in the water. A visual representation of that problem can be obtained by executing the following piece of code: 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 from pyvar import cpr # non-at-all PEP8 style ;) prob = [] for t in range(2,len(retCAAS)-1): ret1 = retCAAS[0:t] ret2 = retNASDAQ[0:t] pBR, _ = cpr(ret1, ret2, thr=thr) prob.append(pBR) # plotting (figure #2) # the conditional probability for rare events given threshold # plt.figure(num=2, figsize=(10, 6)) plt.plot(np.array(prob)*100., 'r') plt.ylabel("Pr(B|R) [%%]") plt.axis('tight') plt.title("L$_{thr}$= %.2f%%" % (thr*100.)) plt.xlabel("Trading days 13/05/2005-13/05/2015") plt.show() where we moved all Bayes’-based calculations (lines #58-102) into a function cpr (a part of pyvar.py local library; see Download section below) and repeated our previous experiment, however, this time,$\mbox{Pr}(R|B)$changing in time given the threshold of$L=-6.5$%. The resulting plot would be: Before ca. 800th day the NASDAQ index was lacking any event of$L\lt -6.5$% therefore the$\mbox{Pr}(R|B)$could not be determined. After that period we got some heavy hits and the conditional probability could be derived. The end value (as for May 13, 2015) is 12.64%. Hope amongst Hopelessness? There is no better title to summarise the outcomes we derived. Our goal was to come up with some (running) probability for very rare event(s) that, in our case, would be expressed as a 1-day loss of$-L$% in trading for a specified financial asset. In the first attempt we engaged VaR-related framework and the estimation of occurrence of the rare event based on the PDF integration. In our second attempt we sought for an alternative solution making use of Bayes’ conditional prediction of the asset’s loss of$-L$% (or more) given the observation of the same event somewhere else (NASDAQ index). For the former we ended up the a running probability of$\mbox{Pr}(L \le L_{thr})$while for the latter with$\mbox{Pr}(R|B)$given$L_{thr}$for both B and R events. Assuming$L_{thr}$to be equal to -6.5%, we can present the corresponding computations in the following chart: Well, the situation looks hopeless. It ain’t get better even if, as discussed earlier, the probability of$\mbox{Pr}(R)$has been added. Both methodologies seem to deliver the answer to the same question however, somehow, we are left confused, disappointed, misled, in tasteless despair… Stay tuned as in Part 2 we will see the light at the end of the tunnel. I promise. DOWNLOAD heavy1.py, heavy2.py, pyvar.py ## Applied Portfolio VaR Decomposition. (2) Impact vs Moving Elements. Calculations of daily Value-at-Risk (VaR) for any$N$-asset portfolio, as we have studied it already in Part 1, heavily depend on the covariance matrix we need to estimate. This estimation requires historical return time-series. Often negligible but superbly important question one should ask here is: How long?! How long those return-series ought to be in order to provide us with fair and sufficient information on the daily return distribution of each asset held in the portfolio? It is intuitive that taking data covering solely past 250 trading days and, for example, past 2500 days (moving element) would have a substantial impact on the calculation of current portfolio VaR. In this short post, we will examine this relation. Case Study (Supplementary Research) Using the example of 7-asset portfolio from Part 1, we modify our Matlab code that allows us to download the history of stock (adjusted) close prices covering past 10 calendar years (ca. 3600 days), 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 % Applied Portfolio VaR Decomposition. (2) Impact vs Moving Elements. % (c) 2015 QuantAtRisk.com, by Pawel Lachowicz clear all; close all; clc; format short % Read the list of the portfolio components fileID = fopen('portfolio.lst'); tmp = textscan(fileID, '%s'); fclose(fileID); pc=tmp{1}; % a list as a cell array % fetch stock data for last 10 years since: t0=735976; % 12-Jan-2015 date2=datestr(t0,'yyyy-mm-dd'); % from date1=datestr(t0-10*365,'yyyy-mm-dd'); % to %{ % create an empty array for storing stock data stockd={}; % scan the tickers and fetch the data from Quandl.com for i=1:length(pc) quandlc=['WIKI/',pc{i}]; fprintf('%4.0f %s\n',i,quandlc); % fetch the data of the stock from Quandl % using recommanded Quandl's command and % saving them directly into Matlab's FTS object (fts) fts=0; [fts,headers]=Quandl.get(quandlc,'type','fints', ... 'authcode','YourQuandlCode',... 'start_date',date1,'end_date',date2); stockd{i}=fts; % entire FTS object in an array's cell end save data2 %} load data2 Assuming the same initial holdings (position sizes) for each asset: 39 40 41 42 43 44 45 46 % an initial dollar expose per position d=[ 55621.00; ... 101017.00; ... 23409.00; ... 1320814.00; ... 131145.00; ... 321124.00; ... 1046867.00]; we aim at calculation of the portfolio VaR, $$\mbox{VaR}_P = 1.65 \sigma_P C = 1.65\left(\boldsymbol{d}’ \boldsymbol{M}_2 \boldsymbol{d} \right)^{1/2}$$ in a function of time or, alternatively speaking, in a function of varying length of return time-series: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 % examine VaR_P as a function of historical data % taken into account e=[]; for td=(10*250):-1:60 % loop over trading days rs=[]; for i=1:length(pc) cp=fts2mat(stockd{i}.Adj_Close,0); rv=cp(2:end,1)./cp(1:end-1,1)-1; rs=[rs rv(end-td:end)]; end rs(isnan(rs))=0.0; % covariance matrix M2=cov(rs); % the portfolio volatility vol_P=sqrt(d'*M2*d); % diversified portfolio VaR VaR_P=1.65*vol_P; e=[e; td VaR_P]; end The meaning of the loop is straightforward: How does$\mbox{VaR}_P$change if we take only last 60 trading days into account, and what is its value when we include more and more data going back in time as far as 10 years? The best way to illustrate the results is to compare running portfolio VaR with the stock market itself, here reflected by ETF of SPY, 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 % download close price time-series for SPY (ETF) % tracking S&P500 Index [fts,headers]=Quandl.get('GOOG/NYSE_SPY','type','fints', ... 'authcode','YourQuandlCode',... 'start_date',date1,'end_date',date2); SPY_cp=fts2mat(fts.Close,0); % plot results subplot(2,1,1) xtmp=(2501:-1:1)./250; plot(xtmp,SPY_cp(end-10*250:end,1)); xlim([min(xtmp) max(xtmp)]); set(gca,'xdir','reverse'); ylabel('SPY Close Price [$]')   subplot(2,1,2) plot(e(:,1)./250,e(:,2),'r'); set(gca,'xdir','reverse'); ylabel('VaR_P [$]') xlabel('Time since 12-Jan-2015 [trading years]') uncovering the big picture as follows: And now all becomes clear! Firstly,$\mbox{VaR}_P$of our 7-asset portfolio is a function of data. Secondly,$\mbox{VaR}_P$changes in the way that we might expect as contrasted with the market behaviour. In this point, note an evident increase in$\mbox{VaR}_P$value if we include not 6 but 7 years of data. Why? The market went down in 2008 and the impact of it had been imprinted in the daily losses of single stocks. Therefore this contribution enriched the left tail of their return distributions. An inclusion of this information in our covariance matrix in a natural way increases the level of$\mbox{VaR}_P$. So, how long? How many data should we include to get fair estimation of$\mbox{VaR}_P$? It’s not so easy to address. I would go with a lower bound of at least 1 year. As for the upper limit, it depends. If you are optimistic that in 2015 stocks will keep their upward momentum, look back up to 5 years. If you are a cautious investor, include 7 to 10 years of data. Footnote Some of you may still wonder why Impact vs Moving Elements?! Well, I believe the inspiration for a good title may come from the most unexpected direction. It was 2013 when I heard that song for the first time and I loved it! Through ages, beautiful women have always inspired men, therefore let me share my inspiration with you. You can become inspired too ;-) ## Visualisation of N-Asset Portfolio in Matlab Many of you who wished to use the Matlab’s Financial Toolbox for portfolio visualization most probably realised that something was wrong. Not every time the annualised risk and return for different time-series matched with what was expected. Below you can find an improved piece of code which does the job correctly. The input sampling of your financial time-series now can be specified directly as daily, weekly, monthly, or quarterly and the risk-return diagram will be displayed with annualised results. Otherwise, you have an option, instead of annualised value, specify another time-horizon. All smooth and easy. % Specialised plotting function for visualisation of portfolios % based on the Financial Toolbox's portfolio objects % % Modified by Pawel Lachowicz, 2014/03/01 % % Remarks: Essential corrections introduced in the calculation and plotting % of the annualised risk and returns coded incorrectly by the team at % The MathWorks, Inc. function plotportfolio(varargin) plottitle = varargin{1}; % plot title, e.g. 'Efficient Frontier' plotfreq = varargin{2}; % frequency of raw data (dt; or 'daily', etc.) plotfreq2= varargin{3}; % e.g. ’annualised’ plotfreq3= varargin{4}; plotlegend = []; switch plotfreq2 case 'annualised' switch plotfreq case 'daily' periods=252; case 'weekly' periods=52; case 'monthly' periods=12; case 'quarterly' periods=4; end otherwise periods=plotfreq3; end for i = 5:nargin plotinfo = varargin{i}; plottype = plotinfo{1}; plotrsk = plotinfo{2}; plotret = plotinfo{3}; if numel(plotinfo) > 3 plotlabel = plotinfo{4}; else plotlabel = []; end if numel(plotinfo) > 4 plotstyle = plotinfo{5}; if isempty(plotstyle) plotstyle = 'b'; end else if strcmpi(plottype,'line') plotstyle = 'b'; else plotstyle = 'g'; end end if numel(plotinfo) > 5 plotline = plotinfo{6}; if isempty(plotline) plotline = 1; end else if strcmpi(plottype,'line') plotline = 2; else plotline = []; end end % line plot if strcmpi(plottype,'line') hold on; for k = 1:size(plotrsk,2) plot(plotrsk(:,k)*sqrt(periods),((plotret(:,k)+1).^periods-1), ... plotstyle, 'LineWidth', plotline); if i == 2 && k == 1 hold on end if ~isempty(plotlabel) && ~isempty(plotlabel{k}) plotlegend = [ plotlegend, {plotlabel{k}} ]; end end % scatter plot else if any(plotstyle == '.') scatter(plotrsk*sqrt(periods),((plotret(:,k)+1).^periods-1),'or','Filled'); else scatter(plotrsk*sqrt(periods),((plotret(:,k)+1).^periods-1),'og','Filled'); end if i==2 hold on end if ~isempty(plotlabel) for k = 1:numel(plotrsk) if ~isempty(plotlabel{k}) text(plotrsk(k)*sqrt(periods)+0.005, ... ((plotret(k)+1).^periods-1), plotlabel{k},'FontSize', 9); end end end end end if ~isempty(plotlegend) legend(plotlegend,'Location','SouthEast'); end if(plotfreq2=='annualised') xlabel('Annualised Volatility'); ylabel('Annualised Expected Returns'); else xlabel('Volatility'); ylabel('Returns'); end grid on end Now, you can call it with various parameters of your interest, matching the data directly, for instance: plotportfolio('Portfolio',dt,'annualised',[], ... {'line', prsk, pret}, ... {'scatter',CashRsk,CashRet, {'Cash'}}, ... {'scatter', sqrt(diag(p.AssetCovar)), p.AssetMean, p.AssetList,'.r'}); For some awesome practical application of this function in the process of construction and analysis of$N$-asset portfolio grab Applied Portfolio Optimization with Risk Management using Matlab ebook today. ## Asset Allocation for Tangent Portfolio with Risk-Free Asset in Python Lesson 5>> Studying a new programming language is an adventurous journey. It takes time in case of Python. Even for me. Not every concept is easy to grasp and testing a piece of code with its endless modifications may take a longer while. We have two choices: either to get a good book like upcoming Python for Quants or sit down and apply a copy-and-paste-and-test method. The latter approach is much longer but rewarding in the middle of our learning process. Hit and try. So, why not to try? You cannot write the code which performs complex operations omitting necessary set of functions. There is no need to reinvent the wheel. In Python, the community works hard to deliver best programming solutions. In this lesson, we will accelerate by conducting an investigation of Python code aimed at finding optimised weights for a tangent portfolio problem. In QaR ebook on Applied Portfolio Optimization with Risk Management using Matlab we discussed in great detail the theory and practical calculations for various cases of portfolios with different objective functions. Markowitz in 1952 underlined that the goal of portfolio choice was either to look for such portfolio which could deliver maximum return at a given level of risk or minimum risk for a given level of return. Based on the theoretical works of Sharpe in 1964, Lintner in 1965 and Tobin in 1958, the importance of the risk-free asset in the portfolio has been proved to equip the investor with a better control over risk. We can split the budget into fractions of our capital designated for an investment in the risk-free option (e.g. the savings account in a bank) while the rest will be delegated to other assets with diversified risk levels. It was shown that for any portfolio with the risk-free component, the expected return is: $$R_P = mw^T + (1+{\bf 1}w^T)r_f \ \ \ \mbox{at} \ \ \ \sigma_P = wCw^T$$ where by$C$,$m$, and$w$the covariance matrix, individual asset expected return matrix, and asset weighting have been denoted accordingly. If we aim at variance,$\sigma_P$, to be minimised: $$\min_{w} \ \ wCw^T$$ subject to$mw^T + (1+{\bf 1}^T)r_f = r_{\rm target}$, we formulate the minimum variance portfolio optimization problem. It occurs that all minimum variance portfolios are a combination of the risk-free asset and a given risky portfolio. The latter is often called the tangent portfolio and has been shown that it must contain of all assets available to investors (held in quantity to its market value relative to the total market value of all assets). That makes the second name of the tangent portfolio: the market portfolio. The objective function of the form: $$\max_{w} \ \ \frac{mw^T-r_f}{\sqrt{wCw^T}}$$ subject to${\bf 1}w^T=1$called the Sharpe ratio corresponding to the market portfolio directly. The risk-free asset is connected with the tangent portfolio by the straight line therefore provides an investor with a good blend of risk-controlled portfolios. Modern Portfolio Theory tells us that the tangent portfolio is given by: $${\bf w} = C^{-1}({\bf m} – {\bf 1}r_f)$$ where the vector${\bf w}$stores the computed weights for each asset in portfolio$P$. Since finding the tangent portfolio given$N$assets is, luckily, an analytical problem (i.e. without employment of the solvers), that makes this task a straightforward problem to be coded in Python as follows. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # Asset Allocation for Tangent Portfolio with Risk-Free Asset in Python # Accelerated Python for Quants Tutorial, Lesson 4 # (c) 2014 QuantAtRisk from numpy import matrix, power from math import sqrt def TangentPortfolio(m,C,rf): # find number of rows and columns for the covariance matrix (nr,nc)=C.shape A=matrix([[0.0] for r in xrange(nr)]) A=(1/C)*(m-rf) (nr,nc)=A.shape A=A/sum(A[r,0] for r in xrange(nr)) w=[A[r,0] for r in xrange(nr)] pret=mu.T*A prsk=power(A.T*(C*A),0.5) return matrix(w),pret,prsk Here, we can see a new element of Python language which is a definition and usage of a function. We start its syntax with def, next write a desired function name with input parameters in the brackets, and end it with a colon. The body of the function must be always indented (min 4 space signs; don’t use tab!) and if some results are intended to be sent out of the function, return function should be specified at the end, listing all variables of interest. In addition we make use of numpy module from which we import only two functions that we will be using. The first one is matrix that allows us to implement matrix or vector notation explicitly (we avoid Python’s lists or arrays at this stage). The dimensions of any matrix M can be return into tuple as shown in line #10. Please note how Python eases our life. In line #15 we create a new one-row vector (matrix) referring directly to certain elements of other matrix by putting for…in loop inside the matrix of w itself. How brilliant it is! Lastly, using a function of power we take its first argument to the power of 1/2, i.e. we compute a square root. To see some action, let us first define an exemplary covariance matrix and vector with expected returns corresponding to 3-assets in the portfolio: 20 21 22 23 24 cov=matrix([[0.04, 0.004, 0.02],[0.004, 0.09, 0.09],[0.02,0.09,0.16]]) mu=matrix([[0.13],[0.11],[0.19]]) rf=0.05 w,ret,rsk=TangentPortfolio(mu,cov,rf) where investing at the risk-free rate of 5% has been added to complete the grand picture of the problem we discuss here. Line #24 reveals the way how we call our function and assign calculated values within the function to outer variables (their names can be different). We display the results on the screen by typing: 26 27 28 29 30 print("Portfolio weights") print(w.T) print("Expected Portfolio Return and Risk") print ret,rsk what returns: Portfolio weights [[ 0.46364368] [ 0.4292997 ] [ 0.10705661]] Expected Portfolio Return and Risk [[ 0.1278374]] [[ 0.19715402]] what simply communicates that the expected portfolio return equals 12.8% at 19.7% of risk if we allocate 46%, 42%, and 10% in asset number 1, 2, and 3, respectively. We find that Sharpe ratio, 32 33 sharpe=(ret-rf)/rsk print(sharpe) equals 0.3948 which corresponds to the Sharpe ratio of the first asset: 35 for r in xrange(3): print((mu[r,0]-rf)/sqrt(cov[r,r])) 0.4 0.2 0.35 Finally, if we denote by$\zeta$a fraction of capital we want to invest in risky assets, leaving$(1-\zeta)$in the bank at$r_f=5\%$rate, then the expected portfolio return will be: $$\zeta wm+(1-\zeta)r_r \ \ \ \mbox{at} \ \ \ \zeta\sqrt{wCw^T}$$ of risk, therefore for two different cases, for example: 37 38 39 40 41 42 43 alpha=0.7 print(((matrix(alpha)*w)*mu)+(1-alpha)*rf) print(matrix(alpha)*power(w*cov*w.T,1)) alpha=0.25 print(((matrix(alpha)*w)*mu)+(1-alpha)*rf) print(matrix(alpha)*power(w*cov*w.T,1)) we get [[ 0.10448618]] [[ 0.0272088]] [[ 0.06945935]] [[ 0.00971743]] what confirms that by putting 70% of our capital, for instance, into three stocks should result in 10.4% gain at 2.7% rate of risk, while an allocation of 75% of the capital in the bank promises 6.9% return, i.e. approaching earlier defined risk-free rate of 5% pa. Hungry of Python? Want some more? Stay Tuned! A new ebook Python for Quants is coming this July! Don’t miss the full introductory course to Python with many other practical examples ready-to-rerun and use for your projects: ## Quantitative Risk Assessment Model for Investment Options Working in the superannuation industry in Australia has some great advantages. A nice atmosphere at work, gym sessions with colleagues during lunch time, endless talks about girls after hours. However, as Brian Tracy once said: When you go to work, you work. And this is true. And this is rewarding. Superannuation is the Australian way of making people rich when they retire. Sadly, barely understood by many, it offers a wide palette of investment options for your super (9-12% of every salary is credited to the super account by the employer; regulated by law). Over 120 super funds across Australia, over 7000 options to choose from. All styles of investments, from cash options, through growth and diversified fixed interest, ending among high growth-high risk options. Trust me, the funds’ portfolio managers do their best to beat the benchmarks, the markets, and competing super funds. The industry establishes standards. Regulations. Smarter and unified ways of reporting. On 29 June 2010, Australian Prudential Regulation Authority (APRA) issued a letter to superannuation trustees advising that APRA would be providing guidance on the disclosure of superannuation investment risk to fund members. The letter to trustees states that risk should be measured as the likely number of negative annual returns over any 20 year period. And this is where our story begins. Risk Assessment A working practice of super funds across Australia revealed that since 2013 the way of reporting risk (following the abovementioned definition) still leaves a lot behind the curtain. It’s not clear how the assessment of an individual investment option’s risk has been derived. APRA’s guidance states that this classification system should help members ‘readily distinguish the characteristics of each investment strategy’. To achieve this, there needs to be sufficient categories to meaningfully differentiate between various options based on risk. The funds dare to report the risk score/label according to a seven level classification system to provide sufficient granularity. The Joint ASFA/FSC Working Group’s analysis supports the claim that the number of annual negative periods over any 20 year period is likely to fall in the range of 0 to 7 for the majority of investment options: Risk Band Risk Label Estimated number of negative annual returns over any 20 year period 1 Very Low Less than 0.5 2 Low 0.5 to less than 1 3 Low to Medium 1 to less than 2 4 Medium 2 to less than 3 5 Medium to High 3 to less than 4 6 High 4 to less than 6 7 Very High 6 or Greater Having an access to any investment option performance over past years does not solve fully the problem of their risk assessment following the guidance. Firstly, not every option’s life time is long enough to derive meaningful results. Maximal data span on only few occasions reaches 10 years. Secondly, the data have gaps, therefore not for every option we can fetch its monthly performance figures. Thirdly, an access to data requires huge amount of labour of dedicated people who put all grains of sand into one jar and then are willing to sell it (e.g. the research house of SuperRatings in Sydney). Lastly, knowing the option’s performance over past 3 years the questions arise. How should we estimate its risk score correctly? How many times in the upcoming 20 year period a considered investment option is going to denote negative annual returns? We need a model. Model A quantitative idea standing behind a unification of APRA’s risk measure guidance for investment options can find solution in the fundamentals of statistics. Below we will design a way of guessing the answer for the problem we question. If we recall the concept of the Binomial Distribution, we immediately recognise its potential application. Consider a time-series of monthly returns,$\{ r_i \}$, of the total length of$N$months. If$N$is larger than 12, we can calculate $$m=(N-12)+1$$ times the annual return, $$R_j = \left[ \prod_{i}^{12} (1+r_i) \right] – 1 \ \ \mbox{for} \ j = 1,…,m$$ where$r_i$are given as decimals and$j$denotes specific period of 12 consecutive months.$R_j$should be understood as a rolling annual return given the statistically justified minimum requirement of$N\ge 31$months of data for a specific option. For example, if we have 5 years of data (60 months) then we are able to get$m=(60-12)+1=49$test annual returns based on the uninterrupted data sequences. Here, the word test is crucial. We build our risk score model in the following way. For any option of data length of$N$months we construct a vector of$m$annual returns$R$. We count the total number of negative values and denote it as$k$out of$m$trials. The probability of a single year to close up with the negative annual return is given as: $$p=\frac{k}{m}$$ where$p$can be assigned as the probability of success (in obtaining the negative annual return). Under the assumption of independent trials, one can find the probability of obtaining exactly$k$successes out of, in general,$n$trials making use of Binomial Distribution that is given by the probability mass function: $$Pr⁡(X=k) = {{n}\choose{k}} p^k (1-p)^{n-k}$$ where obviously $${{n}\choose{k}} = \frac{n!}{k!(n-k)!} \ \ .$$ From the data analysis of the option performance, we aim at calculating the probability of$k$negative annual returns in any$n=20$trials, namely: $$Pr⁡(X=k) = {{20}\choose{k}} p^k (1-p)^{20-k}$$ The model can be easily coded in Excel/VBA as a macro. The core algorithm of assessing option’s risk score you can grasp below: If m > 0 Then 'probability of success Dim p As Double p = k / m 'probability for r=1..7 Dim pr As Double, trials As Integer, r As Integer, newton As Double Dim ans As Integer, v0 As Double, v1 As Double v0 = -1 v1 = -1 trials = 20 For r = 1 To 7 newton = Application.Fact(trials) / _ (Application.Fact(r) * Application.Fact(trials - r)) pr = newton * Application.Power(p, r) * Application.Power(1 - p, trials - r) If pr > v0 Then v0 = pr v1 = r End If Next r 'a new risk score If v1 = 1 Then scoreSR = 1 ' "Very Low" Else If v1 = 2 Then scoreSR = 2 ' "Low" Else If v1 = 3 Then scoreSR = 3 ' "Low to Medium" Else If v1 = 4 Then scoreSR = 4 ' "Medium" Else If v1 = 5 Then scoreSR = 5 ' "Medium to High" Else If v1 = 6 Then scoreSR = 6 ' "High" Else If v1 = 7 Then scoreSR = 7 ' "Very High" Else scoreSR = "" End If End If End If End If End If End If End If The resulting solution may lead us to the reformulation of the risk assessment for individual investment options as presented in the following table: Risk Band New Risk Label Estimated number of negative annual returns over any 20 year period ($k$) 1 Very Low 0 to 1 2 Low 2 3 Low to Medium 3 4 Medium 4 5 Medium to High 5 6 High 6 7 Very High 7 or Greater Intuitively, any Cash Option which always denotes positive monthly returns is also ranked (by our model) as an option of a Very Low risk. You can apply this model for any monthly return series since we play we the numbers. This is what quants do best. Number crunching! Nasty Assumption The motivation behind the model sleeps in one word you can find in the definition we underlined at the very beginning: Estimated number of negative annual returns over any 20 year period. ‘Any’ allows us to choose any random sequence of 12 consecutive monthly returns and that pays off in the larger number of trials under the assumption of independence. Na, right. The nasty piece of the model: the independence of individual trials. This is a cornerstone as it comes to Binomial Distribution. As a smart quant you can question its validity within our model. I tackled with this problem for a while and I came to an empirical lemma supporting our line of defence. The independence of individual trials can be justified as an analogous to the following experiment. Imagine that a sheet of A4-format paper is available. The paper has a colour gradually changing from left to right from black to light grey, and the colour gradient is constant in a vertical direction. We cut the sheet horizontally into a finite number of strips of paper of 1 cm height, however starting every new strip by 1 cm to the right (the concept of a sliding window in time-serie analysis). Any comparison of two adjacent strips of paper indicates a finite common area of the same colour as defined by the gradient: correlation. We put all strips into one jar and churn it. Next we randomly pull strips out of the jar. Each strip is an independent trial out of all strips placed inside the jar as the memory of correlation between two formally adjacent strips has been lost. Invest smartly and don’t worry. Correlations are everywhere. ## Information Ratio and its Relative Strength for Portfolio Managers The risk and return are the two most significant quantities investigated within the investment performance. We expect the return to be highest at the minimum risk. In practice, a set of various combinations is always achieved depending on a number of factors related to the investment strategies and market behaviours, respectively. The portfolio manager or investment analyst is interested in the application of most relevant tool in order to described the overall performance. The arsenal is rich but we need to know the character of outcomes we wish to underline prior to fetching the proper instruments from the workshop. In general there are two schools of investment performance. The first one is strictly related to the analysis of time-series on the tick-by-tick basis. All I mean by tick is a time scale of your data, e.g. minutes or months. The analyst is concerned about changes in investments and risk tracked continuously. A good example of this sort of approach is measurement of Value-at-Risk and Expected Shortfall, Below Target Risk or higher-moments of Coskewness and Cokurtosis. It all supplements the n-Asset Portfolio Construction with Efficient Frontier diagnostic. The second school in the investment analysis is a macro-perspective. It includes the computation on$n$-period rolling median returns (or corresponding quantities), preferably annualized for the sake of comparison. This approach carries more meaning behind the lines rather than a pure tick time-series analysis. For the former, we track a sliding changes in accumulation and performance while for the latter we focus more closely on tick-by-tick risks and returns. For the sake of global picture and performance (market-oriented vantage point), a rolling investigation of risk and return reveals mutual tracking factors between different considered scenarios. In this article I scratch the surface of the benchmark-oriented analysis with a special focus on the Information Ratio (IR) as an investment performance-related measure. Based on the market data, I test the IR in action and introduce a new supplementary measure of Information Ratio Relative Strength working as a second dimension for IR results. 1. Benchmark-oriented Analysis Let’s assume you a portfolio manager and you manage over last 10 years 11 investment strategies. Each strategy is different, corresponds to different markets, style of investing, and asset allocation. You wish to get a macro-picture how well your investment performed and are they beating the benchmark. You collect a set of 120 monthly returns for each investment strategy plus their benchmarks. The easiest way to kick off the comparison process is the calculation of$n$-period median (or mean) returns. Annualized Returns The rolling$n$Y ($n$-Year) median return is defined as: $$\mbox{rR}_{j}(n\mbox{Y}) = \left[ \prod_{i=1}^{12n} (r_{i,j}+1)^{1/n} \right] -1 \ \ \ \ \ \mbox{for}\ \ \ j=12n,…,N\ \ (\mbox{at}\ \Delta j =1)$$ where$r$are the monthly median returns meeting the initial data selection criteria (if any) and$N$denotes the total number of data available (in our case,$N=120$). Any pre-selection (pre-filtering) of the input data becomes more insightful when contrasted with the benchmark performance. Thereinafter, by the benchmark we can assume a relative measure we will refer to (e.g. S\&P500 Index for US stocks, etc.) calculated as the rolling$n$Y median. Therefore, for instance, the 1Y (rolling) returns can be compared against the corresponding benchmark (index). The selection of a good/proper benchmark belongs to us. Active Returns We define active returns on the monthly basis as the difference between actual returns and benchmark returns, $$a_i = r_i – b_i ,$$ as available at$i$where$i$denotes a continuous counting of months over past 120 months. Annualized Tracking Risk (Tracking Error) We assume the widely accepted definition of the tracking risk as the standard deviation of the active returns. For the sake of comparison, we transform it the form of the annualized tracking risk in the following way: $$\mbox{TR}_{j}(n\mbox{Y}) = \sqrt{12} \times \sqrt{ \frac{1}{12n-1} \sum_{i=1}^{12n} \left( a_{i,j}-\langle a_{i,j} \rangle \right)^2 }$$ where the square root of 12 secures a proper normalization between calculations and$j=12n,…,N$. Annualized Information Ratio (IR) The information ratio is often referred to as a variation or generalised version of Sharpe ratio. It tells us how much excess return ($a_i$) is generated from the amount of excess risk taken relative to the benchmark. We calculate the annualized information ratio by dividing the annualised active returns by the annualised tracking risk as follows: $$\mbox{IR}_j(n\mbox{Y}) = \frac{ \left[ \prod_{i=1}^{12n} (a_{i,j}+1)^{1/n} \right] -1 } { TR_j(n\mbox{Y}) } \ \ \ \ \ \mbox{for}\ \ \ j=12n,…,N$$ We assume the following definition of the investment under-performance and out-performance,$P$, as defined based on IR: $$P_j = \begin{cases} P_j^{-} =\mbox{under-performance}, & \text{if } \mbox{IR}_j(n\mbox{Y}) \le 0 \\ P_j^{+} =\mbox{out-performance}, & \text{if } \mbox{IR}_j(n\mbox{Y}) > 0 . \end{cases}$$ Information Ratio Relative Strength of Performance Since$P_j$marks in time only those periods of time (months) where a given quantity under investigation achieved better results than the benchmark or not, it does not tell us anything about the importance (weight; strength) of this under/over-performance. The case could be, for instance, to observe the out-performance for 65\% of time over 10 years but barely beating the benchmark in the same period. Therefore, we introduce an additional measure, namely, the IR relative strength defined as: $$\mbox{IRS} = \frac{ \int P^{+} dt } { \left| \int P^{-} dt \right| } .$$ The IRS measures the area under$\mbox{IR}_j(n\mbox{Y})$function when$\mbox{IR}_j(n\mbox{Y})$is positive (out-performance) relative to the area cofined between$\mbox{IR}_j(n\mbox{Y})$function and$\mbox{IR}_j(n\mbox{Y})=0$. Thanks to that measure, we are able to estimate the revelance of $$\frac { n(P^{+}) } { n(P^{-}) + n(P^{+}) }$$ ratio, i.e. the fraction of time when the examined quantity displayed out-performance over specified period of time (e.g. 10 years of data within our analysis). Here,$n(P^{+})$counts number of$P_j^{+}$instances. 2. Case Study Let’s demonstrate the aforementioned theory in practice. Making use of real data, we perform a benchmark-oriented analysis for all 11 investment strategies. We derive 1Y rolling median returns supplemented by the correspoding 1Y tracking risk and information ratio measures. The following figure presents the results for 1Y returns of for the Balanced Options: Interestingly to notice, the Investment Strategy-2 (Inv-2) displays an increasing tracking risk between mid-2007 and mid-2009 corresponding to the Credit Global Financial Crisis (GFC). This consistent surge in TR is explained by worse than average performance of the individual options, most probably dictated by higher than expected volatility of assets within portfolio. The 1Y returns of Inv-2 were able to beat the benchmark over only 25% of time, i.e. about 2.25 years since Jun/30 2004. We derive that conclusion based on the quantitative inspection of the Information Ratio plot (bottom panel). In order to understand the overall performance for all eleven investment strategies, we allow ourselves to compute the under-performance and out-performance ratios, $$\frac { n(P^{-}) } { n(P^{-}) + n(P^{+}) } \ \ \ \mbox{and} \ \ \ \frac { n(P^{+}) } { n(P^{-}) + n(P^{+}) } ,$$ and plot these ratios in red and blue colour in the following figure, respectively: Only two investment strategies, Inv-7 and Inv-11, managed to keep the out-performance time longer than their under-performance. Inv-8 denoted the highest ratio of under-performance to out-performance but also Inv-2 (consider above)performed equally badly. The 3rd dimension to this picture delivers the investigation of the IR Relative Strength measure for all investment strategies. We derive and display the results in next figure: The plot reveals close-to-linear correlation between two considered quantities. Inv-7 and Inv-11 strategies confirm their relative strength of out-performance while Inv-2 and Inv-8 their weakness in delivering strong returns over their short period of out-performance. It is of great interest to compare Inv-3, Inv-4, and Inv-10 strategies. While relative strength of out-performance of Inv-4 remains lower as contrasted with the latter ones (and can be explained by low volatility within this sort of investment strategy), Inv-10 denotes much firmer gains over benchmark than Inv-3 despite the fact that their out-performance period was nearly the same ($\sim$40%). Only the additional cross-correlation of IR Relative Strength outcomes as a function of the average return year-over-year is able to address the drivers of this observed relation. Discuss this topic on QaR Forum You can discuss the current post with other quants within a new Quant At Risk Forum. ## Create a Portfolio of Stocks based on Google Finance Data fed by Quandl There is an old saying Be careful what you wish for. Many quants and traders wished for a long time for a better quality of publicly available data. It wasn’t an easy task to accomplish but with a minimum of effort, it has become possible to get all what you truly dreamt of. When I started my experience with daily backtesting of my algo strategies I initially relied on Yahoo! data. Yeah, easily accessible but of low quality. Since the inception of Google Finance the same dream had been reborn. More precise data, covering more aspects of trading, platforms, markets, instruments. But how to get them all? Quandl.com provides us with an enormous access to an enormous database. Just visit the webpage to find out what I mean by barley putting my thoughts all together in one sentence. In fact, the portal’s accessibility via different programming languages (e.g. Matlab, C++, Python, etc.) makes that database more alive than ever. The space of parameters the quants get an access to becomes astronomical in number! Alleluja! Let’s ride this horse. Enough of watching the online charts changing a direction of our investing moods with a one-mintute time resolution. We were born to be smarter than stochastic processes taking control over our daily strategies. In this short post, making use of the Matlab R2013a platform, I want to show you how quickly you can construct any portfolio of US stocks taken from the Dow Jones Index universe (as an example). We will use the Google Finance stock data fed by Quandl.com. Google Finance Stock Data for Model Feeding Similar to Yahoo! Finance’s resources, Quandl.com has an own system of coding the names of all US stocks. Therefore, we need to know it before doing any further step. The .csv file containing all of the codes for Quandl’s stock data can be obtained via API for Stock Data webpage. The file contains the header of the form: Ticker Stock Name Price Code Ratios Code In Market? linking Google Finance ticker with Quandl’s code (Price Code). Doing some extra job with data filtering, I created the pre-processed set of all tickers excluding those non-active in the markets and removing all invalid data. You can download it here QuandlStockCodeListUS.xlsx as an Excel workbook. The second list we need to have is a list of potential stock we want to deal with in our portfolio. As for today, the current list of Dow Jones Index’s components you can download here: dowjones.lst. Let’s start with it: 1 2 3 4 5 6 7 8 9 10 11 12 13 % Create a Portfolio from Dow Jones Stocks based on Google Finance % Data fed by Quandl.com % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz close all; clear all; clc; % Read the list of Dow Jones components fileID = fopen('dowjones.lst'); tmp = textscan(fileID, '%s'); fclose(fileID); dowjc=tmp{1}; % a list as a cell array The guys from Mathworks suggest to make a new and good practice of using textscan command (lines 10-12) instead of dataread for reading data from the text files. The latter will be removed from the next Matlab release. Next, we need to import the current list of tickers for US stocks and their corresponding codes in the Quandl’s database. We will make the use of my pre-processed data set of QuandlStockCodeListUS.xlsx as mentioned earlier in the following way: 15 16 17 % Read in the list of tickers and internal code from Quandl.com [ndata, text, alldata] = xlsread('QuandlStockCodeListUS.xlsx'); quandlc=text(:,1); % again, as a list in a cell array For the simplicity of our example, we will be considering all 30 stocks as a complete sample set of stocks in our portfolio. A portfolio construction is not solely a selection of stocks. It usually links a time period of the past performance of each individual stock. In the following move, we will collect daily stock returns over last 365 calendar year (corresponding to about 252 trading days). This framework is pretty handful. Why? If you build or run live a model, most probably your wish is to update your database when US markets are closed. You fetch Quandl.com for data. Now, if your model employs risk management or portfolio optimisation as an integrated building-block, it’s very likely, you are interested in risk(-return) estimation, e.g. by calculating Value-at-Risk (VaR) or similar risk measures. Of course, the choice of the time-frame is up to you based on your strategy. Having that said, 19 20 21 % fetch stock data for last 365 days date2=datestr(today,'yyyy-mm-dd') % from date1=datestr(today-365,'yyyy-mm-dd') % to what employs an improved Matlab’s command of datestr with formatting as a parameter. It’s really useful when it comes to data fetching from Quandl.com as the format of the date they like to see in your code is yyyy-mm-dd. The above two commands return in Matlab the following values: date2 = 2013-11-03 date1 = 2012-11-03 as for the time of writing this article (string type). Get an Access to Quandl’s Resources Excellent! Let’s catch some fishes in the pond! Before doing it we need to make sure that Quandl’s software is installed and linked to our Matlab console. In order to download Quandl Matlab package just visit this website and follow the installation procedure at the top of the page. Add proper path in Matlab environment too. The best practice is to register yourself on Quandl. That allows you to rise the daily number of data call from 50 to 500. If you think you will need more data requests per day, from the same webpage you can send your request. When registered, in your Profile section on Quandi, you will gain an access to your individual Authorization Token, for example: You will need to replace ‘yourauthenticationtoken’ in the Matlab code (see line 35) with your token in order to get the data. Fetching Stock Data From this point the road stops to wind. For all 30 stocks belonging to the current composition of Dow Jones Index, we get their Open, High, Low, Close prices (as retrieved from Google Finance) in the following way: 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 % create an empty array for storing stock data stockd={}; % scan the Dow Jones tickers and fetch the data from Quandl.com for i=1:length(dowjc) for j=1:length(quandlc) if(strcmp(dowjc{i},quandlc{j})) fprintf('%4.0f %s\n',i,quandlc{j}); % fetch the data of the stock from Quandl % using recommanded Quandl's command and % saving them directly into Matlab's FTS object (fts) fts=0; [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ... 'authcode','yourauthenticationtoken',... 'start_date',date1,'end_date',date2); stockd{i}=fts; % entire FTS object in an array's cell end end end % use 'save' command to save all actual variables from memory on the disc, % if you need it for multiple code running % save data A code in line 35 to 37 is an example how you call Quandi for data and fetch them into FINTS (financial time series object from the Financial Toolbox in Matlab). So easy. If you wish to change the time frequency, apply transformation or get the data not in FINTS but .csv format, all is available for you as a parameter at the level of calling Quandl.get() command (explore all possibilities here). Having that, you can easily plot the Google Finance data, for instance, of IBM stock straight away: % diplay High-Low Chart for IBM highlow(stockd{11}) title('IBM'); ylabel('Stock Price ($)');

to get:

or prepare a return-risk plot for your further portfolio optimisation

ret=[]; for i=1:length(stockd) % extract the Close Price tmp=fts2mat(stockd{i}.Close,0); % calculate a vector of daily returns r=tmp(2:end)./tmp(1:end-1)-1; ret=[ret r]; end % calculate the annualized mean returns for 30 stocks mret=mean(ret)*sqrt(250); % and corresponding standard deviations msig=std(ret)*sqrt(250);   % plot return-risk diagram p=100*[msig; mret]' plot(p(:,1),p(:,2),'b.'); xlim([0 30]); xlabel('Annualized Risk of Stock (%)') ylabel('Annualized Mean Return (%)');

providing us with

Forum: Further Discussions

You can discuss the current post with other quants within a new Quant At Risk Forum (Data Handling and Pre-Processing section).

## Anxiety Detection Model for Stock Traders based on Principal Component Analysis

So, is there a way to disentangle the emotional part involved in trading from all other factors (e.g. the application of technical analysis, bad news consequences, IPOs, etc.) which are somehow easier to deduce? In this post I will try to make a quantitative attempt towards solving this problem. Although the solution will not have the final and closed form, my goal is to deliver an inspiration for quants and traders interested in the subject by putting a simple idea into practice: the application of Principal Component Analysis.

1. Principal Component Analysis (PCA)

Called by many as one of the most valuable results from applied linear algebra, the Principal Component Analysis, delivers a simple, non-parametric method of extracting relevant information from often confusing data sets. The real-world data usually hold some relationships among their variables and, as a good approximation, in the first instance we may suspect them to be of the linear (or close to linear) form. And the linearity is one of stringent but powerful assumptions standing behind PCA.

Imagine we observe the daily change of prices of $m$ stocks (being a part of your portfolio or a specific market index) over last $n$ days. We collect the data in $\boldsymbol{X}$, the matrix $m\times n$. Each of $n$-long vectors lie in an $m$-dimensional vector space spanned by an orthonormal basis, therefore they are a linear combination of this set of unit length basic vectors: $\boldsymbol{BX} = \boldsymbol{X}$ where a basis $\boldsymbol{B}$ is the identity matrix $\boldsymbol{I}$. Within PCA approach we ask a simple question: is there another basis which is a linear combination of the original basis that represents our data set? In other words, we look for a transformation matrix $\boldsymbol{P}$ acting on $\boldsymbol{X}$ in order to deliver its re-representation:
$$\boldsymbol{PX} = \boldsymbol{Y} \ .$$ The rows of $\boldsymbol{P}$ become a set of new basis vectors for expressing the columns of $\boldsymbol{X}$. This change of basis makes the row vectors of $\boldsymbol{P}$ in this transformation the principal components of $\boldsymbol{X}$. But how to find a good $\boldsymbol{P}$?

Consider for a moment what we can do with a set of $m$ observables spanned over $n$ days? It is not a mystery that many stocks over different periods of time co-vary, i.e. their price movements are closely correlated and follow the same direction. The statistical method to measure the mutual relationship among $m$ vectors (correlation) is achieved by the calculation of a covariance matrix. For our data set of $\boldsymbol{X}$:
$$\boldsymbol{X}_{m\times n} = \left[ \begin{array}{cccc} \boldsymbol{x_1} \\ \boldsymbol{x_2} \\ … \\ \boldsymbol{x_m} \end{array} \right] = \left[ \begin{array}{cccc} x_{1,1} & x_{1,2} & … & x_{1,n} \\ x_{2,1} & x_{2,2} & … & x_{2,n} \\ … & … & … & … \\ x_{m,1} & x_{m,2} & … & x_{m,n} \end{array} \right]$$
the covariance matrix takes the following form:
$$cov(\boldsymbol{X}) \equiv \frac{1}{n-1} \boldsymbol{X}\boldsymbol{X}^{T}$$ where we multiply $\boldsymbol{X}$ by its transposed version and $(n-1)^{-1}$ helps to secure the variance to be unbiased. The diagonal elements of $cov(\boldsymbol{X})$ are the variances corresponding to each row of $\boldsymbol{X}$ whereas the off-diagonal terms of $cov(\boldsymbol{X})$ represent the covariances between different rows (prices of the stocks). Please note that above multiplication assures us that $cov(\boldsymbol{X})$ is a square symmetric matrix $m\times m$.

All right, but what does it have in common with our PCA method? PCA looks for a way to optimise the matrix of $cov(\boldsymbol{X})$ by a reduction of redundancy. Sounds a bit enigmatic? I bet! Well, all we need to understand is that PCA wants to ‘force’ all off-diagonal elements of the covariance matrix to be zero (in the best possible way). The guys in the Department of Statistics will tell you the same as: removing redundancy diagonalises $cov(\boldsymbol{X})$. But how, how?!

Let’s come back to our previous notation of $\boldsymbol{PX}=\boldsymbol{Y}$. $\boldsymbol{P}$ transforms $\boldsymbol{X}$ into $\boldsymbol{Y}$. We also marked that:
$$\boldsymbol{P} = [\boldsymbol{p_1},\boldsymbol{p_2},…,\boldsymbol{p_m}]$$ was a new basis we were looking for. PCA assumes that all basis vectors $\boldsymbol{p_k}$ are orthonormal, i.e. $\boldsymbol{p_i}\boldsymbol{p_j}=\delta_{ij}$, and that the directions with the largest variances are the most principal. So, PCA first selects a normalised direction in $m$-dimensional space along which the variance in $\boldsymbol{X}$ is maximised. That is first principal component $\boldsymbol{p_1}$. In the next step, PCA looks for another direction along which the variance is maximised. However, because of orthonormality condition, it looks only in all directions perpendicular to all previously found directions. In consequence, we obtain an orthonormal matrix of $\boldsymbol{P}$. Good stuff, but still sounds complicated?

The goal of PCA is to find such $\boldsymbol{P}$ where $\boldsymbol{Y}=\boldsymbol{PX}$ such that $cov(\boldsymbol{Y})=(n-1)^{-1}\boldsymbol{XX}^T$ is diagonalised.

We can evolve the notation of the covariance matrix as follows:
$$(n-1)cov(\boldsymbol{Y}) = \boldsymbol{YY}^T = \boldsymbol{(PX)(PX)}^T = \boldsymbol{PXX}^T\boldsymbol{P}^T = \boldsymbol{P}(\boldsymbol{XX}^T)\boldsymbol{P}^T = \boldsymbol{PAP}^T$$ where we made a quick substitution of $\boldsymbol{A}=\boldsymbol{XX}^T$. It is easy to prove that $\boldsymbol{A}$ is symmetric. It takes a longer while to find a proof for the following two theorems: (1) a matrix is symmetric if and only if it is orthogonally diagonalisable; (2) a symmetric matrix is diagonalised by a matrix of its orthonormal eigenvectors. Just check your favourite algebra textbook. The second theorem provides us with a right to denote:
$$\boldsymbol{A} = \boldsymbol{EDE}^T$$ where $\boldsymbol{D}$ us a diagonal matrix and $\boldsymbol{E}$ is a matrix of eigenvectors of $\boldsymbol{A}$. That brings us at the end of the rainbow.

We select matrix $\boldsymbol{P}$ to be a such where each row $\boldsymbol{p_1}$ is an eigenvector of $\boldsymbol{XX}^T$, therefore
$$\boldsymbol{P} = \boldsymbol{E}^T .$$

Given that, we see that $\boldsymbol{E}=\boldsymbol{P}^T$, thus we find $\boldsymbol{A}=\boldsymbol{EDE}^T = \boldsymbol{P}^T\boldsymbol{DP}$ what leads us to a magnificent relationship between $\boldsymbol{P}$ and the covariance matrix:
$$(n-1)cov(\boldsymbol{Y}) = \boldsymbol{PAP}^T = \boldsymbol{P}(\boldsymbol{P}^T\boldsymbol{DP})\boldsymbol{P}^T = (\boldsymbol{PP}^T)\boldsymbol{D}(\boldsymbol{PP}^T) = (\boldsymbol{PP}^{-1})\boldsymbol{D}(\boldsymbol{PP}^{-1})$$ or
$$cov(\boldsymbol{Y}) = \frac{1}{n-1}\boldsymbol{D},$$ i.e. the choice of $\boldsymbol{P}$ diagonalises $cov(\boldsymbol{Y})$ where silently we also used the matrix algebra theorem saying that the inverse of an orthogonal matrix is its transpose ($\boldsymbol{P^{-1}}=\boldsymbol{P}^T$). Fascinating, right?! Let’s see now how one can use all that complicated machinery in the quest of looking for human emotions among the endless rivers of market numbers bombarding our sensors every day.

2. Covariances of NASDAQ, Eigenvalues of Anxiety

We will try to build a simple quantitative model for detection of the nervousness in the trading markets using PCA.

By its simplicity I will understand the following model assumption: no matter what the data conceal, the 1st Principal Component (1-PC) of PCA solution links the complicated relationships among a subset of stocks triggered by a latent factor attributed by us to a common behaviour of traders (human and pre-programmed algos). It is a pretty reasonable assumption, much stronger than, for instance, the influence of Saturn’s gravity on the annual silver price fluctuations. Since PCA does not tell us what its 1-PC means in reality, this is our job to seek for meaningful explanations. Therefore, a human factor fits the frame as a trial value very well.

Let’s consider the NASDAQ-100 index. It is composed of 100 technology stocks. The most current list you can find here: nasdaq100.lst downloadable as a text file. As usual, we will perform all calculations using Matlab environment. Let’s start with data collection and pre-processing:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 % Anxiety Detection Model for Stock Traders % making use of the Principal Component Analsis (PCA) % and utilising publicly available Yahoo! stock data % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz   clear all; close all; clc;     % Reading a list of NASDAQ-100 components nasdaq100=(dataread('file',['nasdaq100.lst'], '%s', 'delimiter', '\n'))';   % Time period we are interested in d1=datenum('Jan 2 1998'); d2=datenum('Oct 11 2013');   % Check and download the stock data for a requested time period stocks={}; for i=1:length(nasdaq100) try % Fetch the Yahoo! adjusted daily close prices between selected % days [d1;d2] tmp = fetch(yahoo,nasdaq100{i},'Adj Close',d1,d2,'d'); stocks{i}=tmp; disp(i); catch err % no full history available for requested time period end end

where, first, we try to check whether for a given list of NASDAQ-100’s components the full data history (adjusted close prices) are available via Yahoo! server (please refer to my previous post of Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting for more information on the connectivity).

The cell array stocks becomes populated with two-dimensional matrixes: the time-series corresponding to stock prices (time,price). Since the Yahoo! database does not contain a full history for all stocks of our interest, we may expect their different time spans. For the purpose of demonstration of the PCA method, we apply additional screening of downloaded data, i.e. we require the data to be spanned between as defined by $d1$ and $d2$ variables and, additionally, having the same (maximal available) number of data points (observations, trials). We achieve that by:

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 % Additional screening d=[]; j=1; data={}; for i=1:length(nasdaq100) d=[d; i min(stocks{i}(:,1)) max(stocks{i}(:,1)) size(stocks{i},1)]; end for i=1:length(nasdaq100) if(d(i,2)==d1) && (d(i,3)==d2) && (d(i,4)==max(d(:,4))) data{j}=sortrows(stocks{i},1); fprintf('%3i %1s\n',i,nasdaq100{i}) j=j+1; end end m=length(data);

The temporary matrix of $d$ holds the index of stock as read in from nasdaq100.lst file, first and last day number of data available, and total number of data points in the time-series, respectively:

>> d d = 1 729757 735518 3970 2 729757 735518 3964 3 729757 735518 3964 4 729757 735518 3969 .. .. .. .. 99 729757 735518 3970 100 729757 735518 3970

Our screening method saves $m=21$ selected stock data into data cell array corresponding to the following companies from our list:

 1 AAPL 7 ALTR 9 AMAT 10 AMGN 20 CERN 21 CHKP 25 COST 26 CSCO 30 DELL 39 FAST 51 INTC 64 MSFT 65 MU 67 MYL 74 PCAR 82 SIAL 84 SNDK 88 SYMC 96 WFM 99 XRAY 100 YHOO

Okay, some people say that seeing is believing. All right. Let’s see how it works. Recall the fact that we demanded our stock data to be spanned between ‘Jan 2 1998′ and ‘Oct 11 2013′. We found 21 stocks meeting those criteria. Now, let’s assume we pick up a random date, say, Jul 2 2007 and we extract for all 21 stocks their price history over last 90 calendar days. We save their prices (skipping the time columns) into $Z$ matrix as follows:

t=datenum('Jul 2 2007'); Z=[]; for i=1:m [r,c,v]=find((data{i}(:,1)<=t) & (data{i}(:,1)>t-90)); Z=[Z data{i}(r,2)] end

and we plot them all together:

plot(Z) xlim([1 length(Z)]); ylabel('Stock price (US$)'); xlabel('T-90d'); It’s easy to deduct that the top one line corresponds to Apple, Inc. (AAPL) adjusted close prices. The unspoken earlier data processing methodology is that we need to transform our time-series into the comparable form. We can do it by subtracting the average value and dividing each of them by their standard deviations. Why? For a simple reason of an equivalent way of their mutual comparison. We call that step a normalisation or standardisation of the time-series under investigation: [N,M]=size(Z); X=(Z-repmat(mean(Z),[N 1]))./repmat(std(Z),[N 1]); This represents the matrix$\boldsymbol{X}$that I discussed in a theoretical part of this post. Note, that the dimensions are reversed in Matlab. Therefore, the normalised time-series, % Display normalized stock prices plot(X) xlim([1 length(Z)]); ylabel('(Stock price-Mean)/StdDev'); xlabel('T-90d'); look like: For a given matrix of$\boldsymbol{X}$, its covariance matrix, % Calculate the covariance matrix, cov(X) CovX=cov(X); imagesc(CovX); as for data spanned 90 calendar day back from Jul 2 2007, looks like: where the colour coding goes from the maximal values (most reddish) down to the minimal values (most blueish). The diagonal of the covariance matrix simply tells us that for normalised time-series, their covariances are equal to the standard deviations (variances) of 1 as expected. Going one step forward, based on the given covariance matrix, we look for the matrix of$\boldsymbol{P}$whose columns are the corresponding eigenvectors: % Find P [P,~]=eigs(CovX,5); imagesc(P); set(gca,'xticklabel',{1,2,3,4,5},'xtick',[1 2 3 4 5]); xlabel('Principal Component') ylabel('Stock'); set(gca,'yticklabel',{'AAPL', 'ALTR', 'AMAT', 'AMGN', 'CERN', ... 'CHKP', 'COST', 'CSCO', 'DELL', 'FAST', 'INTC', 'MSFT', 'MU', ... 'MYL', 'PCAR', 'SIAL', 'SNDK', 'SYMC', 'WFM', 'XRAY', 'YHOO'}, ... 'ytick',[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21]); which results in$\boldsymbol{P}$displayed as: where we computed PCA for five principal components in order to illustrate the process. Since the colour coding is the same as in the previous figure, a visual inspection of of the 1-PC indicates on negative numbers for at least 16 out of 21 eigenvalues. That simply means that over last 90 days the global dynamics for those stocks were directed south, in favour of traders holding short-position in those stocks. It is important to note in this very moment that 1-PC does not represent the ‘price momentum’ itself. It would be too easy. It represents the latent variable responsible for a common behaviour in the stock dynamics whatever it is. Based on our model assumption (see above) we suspect it may indicate a human factor latent in the trading. 3. Game of Nerves The last figure communicates an additional message. There is a remarkable coherence of eigenvalues for 1-PC and pretty random patterns for the remaining four principal components. One may check that in the case of our data sample, this feature is maintained over many years. That allows us to limit our interest to 1-PC only. It’s getting exciting, isn’t it? Let’s come back to our main code. Having now a pretty good grasp of the algebra of PCA at work, we may limit our investigation of 1-PC to any time period of our interest, below spanned between as defined by$t1$and$t2$variables: 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 % Select time period of your interest t1=datenum('July 1 2006'); t2=datenum('July 1 2010'); results=[]; for t=t1:t2 tmp=[]; A=[]; V=[]; for i=1:m [r,c,v]=find((data{i}(:,1)<=t) & (data{i}(:,1)>t-60)); A=[A data{i}(r,2)]; end [N,M]=size(A); X=(A-repmat(mean(A),[N 1]))./repmat(std(A),[N 1]); CovX=cov(X); [V,D]=eigs(CovX,1); % Find all negative eigenvalues of the 1st Principal Component [r,c,v]=find(V(:,1)<0); % Extract them into a new vector neg1PC=V(r,1); % Calculate a percentage of negative eigenvalues relative % to all values available ratio=length(neg1PC)/m; % Build a new time-series of 'ratio' change over required % time period (spanned between t1 and t2) results=[results; t ratio]; end We build our anxiety detection model based on the change of number of eigenvalues of the 1st Principal Component (relative to the total their numbers; here equal 21). As a result, we generate a new time-series tracing over$[t1;t2]$time period this variable. We plot the results all in one plot contrasted with the NASDAQ-100 Index in the following way: 75 76 77 78 79 80 81 82 83 84 85 % Fetch NASDAQ-100 Index from Yahoo! data-server nasdaq = fetch(yahoo,'^ndx','Adj Close',t1,t2,'d'); % Plot it subplot(2,1,1) plot(nasdaq(:,1),nasdaq(:,2),'color',[0.6 0.6 0.6]); ylabel('NASDAQ-100 Index'); % Add a plot corresponding to a new time-series we've generated subplot(2,1,2) plot(results(:,1),results(:,2),'color',[0.6 0.6 0.6]) % add overplot 30d moving average based on the same data hold on; plot(results(:,1),moving(results(:,2),30),'b') leading us to: I use 30-day moving average (a solid blue line) in order to smooth the results (moving.m). Please note, that line in #56 I also replaced the earlier value of 90 days with 60 days. Somehow, it is more reasonable to examine with the PCA the market dynamics over past two months than for longer periods (but it’s a matter of taste and needs). Eventually, we construct the core model’s element, namely, we detect nervousness among traders when the percentage of negative eigenvalues of the 1st Principal Component increases over (at least) five consecutive days: 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 % Model Core x1=results(:,1); y1=moving(results(:,2),30); tmp=[]; % Find moments of time where the percetage of negative 1-PC % eigenvalues increases over time (minimal requirement of % five consecutive days for i=5:length(x1) if(y1(i)>y1(i-1))&&(y1(i-1)>y1(i-2))&&(y1(i-2)>y1(i-3))&& ... (y1(i-3)>y1(i-4))&&(y1(i-4)>y1(i-5)) tmp=[tmp; x1(i)]; end end % When found z=[]; for i=1:length(tmp) for j=1:length(nasdaq) if(tmp(i)==nasdaq(j,1)) z=[z; nasdaq(j,1) nasdaq(j,2)]; end end end subplot(2,1,1); hold on; plot(z(:,1),z(:,2),'r.','markersize',7); The results of the model we over-plot with red markers on top of the NASDAQ-100 Index: Our simple model takes us into a completely new territory of unexplored space of latent variables. Firstly, it does not predict the future. It still (unfortunately) remains unknown. However, what it delivers is a fresh look at the past dynamics in the market. Secondly, it is easily to read out from the plot that results cluster into three subgroups. The first subgroup corresponds to actions in the stock trading having further negative consequences (see the events of 2007-2009 and the avalanche of prices). Here the dynamics over any 60 calendar days had been continued. The second subgroup are those periods of time when anxiety led to negative dynamics among stock traders but due to other factors (e.g. financial, global, political, etc.) the stocks surged dragging the Index up. The third subgroup (less frequent) corresponds to instances of relative flat changes of Index revealing a typical pattern of psychological hesitation about the trading direction. No matter how we might interpret the results, the human factor in trading is evident. Hopefully, the PCA approach captures it. If not, all we are left with is our best friend: a trader’s intuition. Acknowledgments An article dedicated to Dr. Dariusz Grech of Physics and Astronomy Department of University of Wroclaw, Poland, for his superbly important! and mind-blowing lectures on linear algebra in the 1998/99 academic year. ## GARCH(p,q) Model and Exit Strategy for Intraday Algorithmic Traders Forecasting future has always been a part of human untamed skill to posses. In an enjoyable Hollywood production of Next Nicolas Cage playing a character of Frank Cadillac has an ability to see the future just up to a few minutes ahead. This allows him to take almost immediate actions to avoid the risks. Now, just imagine for a moment that you are an (algorithmic) intraday trader. What would you offer for a glimpse of knowing what is going to happen within following couple of minutes? What sort of risk would you undertake? Is it really possible to deduce the next move on your trading chessboard? Most probably the best answer to this question is: it is partially possible. Why then partially? Well, even with a help of mathematics and statistics, obviously God didn’t want us to know future by putting his fingerprint into our equations and calling it a random variable. Smart, isn’t it? Therefore, our goal is to work harder in order to guess what is going to happen next!? In this post I will shortly describe one of the most popular methods of forecasting future volatility in financial time-series employing a GARCH model. Next, I will make use of 5-min intraday stock data of close prices to show how to infer possible stock value in next 5 minutes using current levels of volatility in intraday trading. Ultimately, I will discuss an exit strategy from a trade based on forecasted worst case scenario (stock price is forecasted to exceed the assumed stop-loss level). But first, let’s warm up with some cute equations we cannot live without. Inferring Volatility Capturing and digesting volatility is somehow like an art that does not attempt to represent external, recognizable reality but seeks to achieve its effect using shapes, forms, colors, and textures. The basic idea we want to describe here is a volatility$\sigma_t$of a random variable (rv) of, e.g. an asset price, on day$t$as estimated at the end of the previous day$t-1$. How to do it in the most easiest way? It’s simple. First let’s assume that a logarithmic rate of change of the asset price between two time steps is: $$r_{t-1} = \ln\frac{P_{t-1}}{P_{t-2}}$$ what corresponds to the return expressed in percents as$R_{t-1}=100[\exp(r_{t-1})-1]$and we will be using this transformation throughout the rest of the text. This notation leaves us with a window of opportunity to denote$r_t$as an innovation to the rate of return under condition that we are able, somehow, to deduce, infer, and forecast a future asset price of$P_t$. Using classical definition of a sample variance, we are allowed to write it down as: $$\sigma_t^2 = \frac{1}{m-1} \sum_{i=1}^{m} (r_{t-i}-\langle r \rangle)^2$$ what is our forecast of the variance rate in next time step$t$based on past$m$data points, and$\langle r \rangle=m^{-1}\sum_{i=1}^{m} r_{t-i}$is a sample mean. Now, if we examine return series which is sampled every one day, or an hour, or a minute, it is worth to notice that$\langle r \rangle$is very small as compared with the standard deviation of changes. This observation pushes us a bit further in rewriting the estimation of$\sigma_t$as: $$\sigma_t^2 = \frac{1}{m} \sum_{i=1}^{m} r_{t-i}^2$$ where$m-1$has been replaced with$m$by adding an extra degree of freedom (equivalent to a maximum likelihood estimate). What is superbly important about this formula is the fact that it gives an equal weighting of unity to every value of$r_{t-i}$as we can always imagine that quantity multiplied by one. But in practice, we may have a small wish to associate some weights$\alpha_i$as follows: $$\sigma_t^2 = \sum_{i=1}^{m} \alpha_i r_{t-i}^2$$ where$\sum_{i=1}^{m} \alpha_i = 1$what replaces a factor of$m^{-1}$in the previous formula. If you think for a second about the idea of$\alpha$’s it is pretty straightforward to understand that every observation of$r_{t-i}$has some significant contribution to the overall value of$\sigma_t^2$. In particular, if we select$\alpha_i<\alpha_j$for$i>j$, every past observation from the most current time of$t-1$contributes less and less. In 1982 R. Engle proposed a tiny extension of discussed formula, finalized in the form of AutoRegressive Conditional Heteroscedasticity ARCH($m$) model: $$\sigma_t^2 = \omega + \sum_{i=1}^{m} \alpha_i r_{t-i}^2$$ where$\omega$is the weighted long-run variance taking its position with a weight of$\gamma$, such$\omega=\gamma V$and now$\gamma+\sum_{i=1}^{m} \alpha_i = 1$. What the ARCH model allows is the estimation of future volatility,$\sigma_t$, taking into account only past$m$weighted rates of return$\alpha_i r_{t-i}$and additional parameter of$\omega$. In practice, we aim at finding weights of$\alpha_i$and$\gamma$using maximum likelihood method for given return series of$\{r_i\}$. This approach, in general, requires approximately$m>3$in order to describe$\sigma_t^2$efficiently. So, the question emerges: can we do much better? And the answer is: of course. Four years later, in 1986, a new player entered the ring. His name was Mr T (Bollerslev) and he literally crushed Engle in the second round with an innovation of the Generalized AutoRegressive Conditional Heteroscedasticity GARCH($p,q$) model: $$\sigma_t^2 = \omega + \sum_{i=1}^{p} \alpha_i r_{t-i}^2 + \sum_{j=1}^{q} \beta_j \sigma_{t-j}^2$$ which derives its$\sigma_t^2$based on$p$past observations of$r^2$and$q$most recent estimates of the variance rate. The inferred return is then equal$r_t=\sigma_t\epsilon_t$where$\epsilon_t\sim N(0,1)$what leave us with a rather pale and wry face as we know what in practice that truly means! A some sort of simplification meeting a wide applause in financial hassle delivers the solution of GARCH(1,1) model: $$\sigma_t^2 = \omega + \alpha r_{t-1}^2 + \beta \sigma_{t-1}^2$$ which derives its value based solely on the most recent update of$r$and$\sigma$. If we think for a shorter while, GARCH(1,1) should provide us with a good taste of forecasted volatility when the series of past couple of returns were similar, however its weakness emerges in the moments of sudden jumps (shocks) in price changes what causes overestimated volatility predictions. Well, no model is perfect. Similarly as in case of ARCH model, for GARCH(1,1) we may use the maximum likelihood method to find the best estimates of$\alpha$and$\beta$parameters leading us to a long-run volatility of$[\omega/(1-\alpha-\beta)]^{1/2}$. It is usually achieved in the iterative process by looking for the maximum value of the sum among all sums computed as follows: $$\sum_{i=3}^{N} \left[ -\ln(\sigma_i^2) – \frac{r_i^2}{\sigma_i^2} \right]$$ where$N$denotes the length of the return series$\{r_j\}$($j=2,…,N$) available for us. There are special dedicated algorithms for doing that and, as we will see later on, we will make use of one of them in Matlab. For the remaining discussion on verification procedure of GARCH model as a tool to explain volatility in the return time-series, pros and cons, and other comparisons of GARCH to other ARCH-derivatives I refer you to the immortal and infamous quant’s bible of John Hull and more in-depth textbook by a financial time-series role model Ruey Tsay. Predicting the Unpredictable The concept of predicting the next move in asset price based on GARCH model appears to be thrilling and exciting. The only worry we may have, and as it has been already recognized by us, is the fact that the forecasted return value is$r_t=\sigma_t\epsilon_t$with$\epsilon_t$to be a rv drawn from a normal distribution of$N(0,1)$. That implies$r_t$to be a rv such$r_t\sim N(0,\sigma_t)$. This model we are allowed to extend further to an attractive form of: $$r_t = \mu + \sigma_t\epsilon_t \ \ \ \sim N(\mu,\sigma_t)$$ where by$\mu$we will understand a simple mean over past$k$data points: $$\mu = k^{-1} \sum_{i=1}^{k} r_{t-i} \ .$$ Since we rather expect$\mu$to be very small, its inclusion provides us with an offset in$r_t$value modeling. Okay, so what does it mean for us? A huge uncertainty in guessing where we gonna end up in a next time step. The greater value of$\sigma_t$inferred from GARCH model, the wider spread in possible values that$r_t$will take. God’s fingerprint. End of hopes. Well, not exactly. Let’s look at the bright side of our$N(\mu,\sigma_t)$distribution. Remember, never give in too quickly! Look for opportunities! Always! So, the distribution has two tails. We know very well what is concealed in its left tail. It’s the devil’s playground of worst losses! Can a bad thing be a good thing? Yes! But how? It’s simple. We can always compute, for example, 5% quantile of$Q$, what would correspond to finding a value of a rv$r_t$for which there is 5% and less of chances that the actual value of$r_t$will be smaller or equal$Q$(a sort of equivalent of finding VaR). Having this opportunity, we may wish to design a test statistic as follows: $$d = Q + r’$$ where$r’$would represent a cumulative return of an open trading position. If you are an intraday (or algorithmic high-frequency) trader and you have a fixed stop-loss limit of$s$set for every trade you enter, a simple verification of a logical outcome of the condition of: $$d < s \ ?$$ provides you immediately with an attractive model for your decision to close your position in next time step based on GARCH forecasting derived at time$t-1$. All right, let’s cut the cheese and let’s see now how does that work in practice! 5-min Trading with GARCH Exit Strategy In order to illustrate the whole theory of GARCH approach and dancing at the edge of uncertainty of the future, we analyze the intraday 5-min stock data of Toyota Motor Corporation traded at Toyota Stock Exchange with a ticker of TYO:7203. The data file of TYO7203m5.dat contains a historical record of trading in the following form: TYO7203 DATES TIMES OPEN HIGH LOW CLOSE VOLUME 07/8/2010 11:00 3135 3140 3130 3130 50900 07/8/2010 11:05 3130 3135 3130 3130 16100 07/8/2010 11:10 3130 3145 3130 3140 163700 ... 01/13/2011 16:50 3535 3540 3535 3535 148000 01/13/2011 16:55 3540 3545 3535 3535 1010900 i.e. it is spanned between 8-Jul-2010 11:00 and 13-Jan-2011 16:55. Using Matlab language, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 % Forecasting Volatility, Returns and VaR for Intraday % and High-Frequency Traders % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz clear all; close all; clc; %% ---DATA READING AND PREPROCESSING fname=['TYO7203m5.dat']; % construct FTS object for 5-min data fts=ascii2fts(fname,'T',1,2,[1 2]); % sampling dt=5; % minutes % extract close prices cp=fts2mat(fts.CLOSE,1); % plot 5-min close prices for entire data set figure(1); plot(cp(:,1)-cp(1,1),cp(:,2),'Color',[0.6 0.6 0.6]); xlabel('Days since 08-Jul-2010 11:00'); ylabel('TYO:7203 Stock Price (JPY)'); let’s plot 5-min close prices for a whole data set available: Now, a moment of concentration. Let’s imagine you are trading this stock based on 5-min close price data points. It is late morning of 3-Dec-2010, you just finished your second cup of cappuccino and a ham-and-cheese sandwich when at 10:55 your trading model sends a signal to open a new long position in TYO:7203 at 11:00. 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 % ---DATES SPECIFICATION % data begin on/at date1='08-Jul-2010'; time1='11:00'; % last available data point (used for GARCH model feeding) dateG='03-Dec-2010'; timeG='12:35'; % a corresponding data index for that time point (idxGARCH) [idxGARCH,~,~]=find(cp(:,1)==datenum([dateG,' ' ,timeG])); % enter (trade) long position on 03-Dec-2010 at 11:00am dateT='03-Dec-2010'; timeT='11:00'; % a corresposding data index for that time point(idx) [idx,~,~]=find(cp(:,1)==datenum([dateT,' ',timeT])); You buy$X$shares at$P_{\rm{open}}=3310$JPY per share at 11:00 and you wait observing the asset price every 5 min. The price goes first up, than starts to fall down. You look at your watch, it is already 12:35. Following a trading data processing, which your Matlab does for you every 5 minutes, 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 % ---DATA PREPROCESSING (2) % extract FTS object spanned between the beginning of data % and last available data point ftsGARCH=fetch(fts,date1,time1,dateG,timeG,1,'d'); cpG=fts2mat(ftsGARCH.CLOSE,1); % extract close prices (vector) retG=log(cpG(2:end,2)./cpG(1:end-1,2)); % extract log return (vector) figure(1); hold on; plot(cp(1:idxGARCH,1)-cp(1,1),cp(1:idxGARCH,2),'b') % a blue line in Figure 2 figure(2); % plot close prices x=1:5:size(cpG,1)*5; x=x-x(idx+1); % plot(x,cpG(:,2),'-ok'); xlim([-20 110]); ylim([3285 3330]); ylabel('TYO:7203 Stock Price (JPY)'); xlabel('Minutes since 03-Dec-2010 11:00'); % add markers to the plot hold on; % plot(-dt,cpG(idx,2),'o','MarkerEdgeColor','g'); hold on; plot(0,cpG(idx+1,2),'o','MarkerFaceColor','k','MarkerEdgeColor','k'); % mark -0.5% stoploss line xs=0:5:110; stoploss=(1-0.005)*3315; % (JPY) ys=stoploss*ones(1,length(xs)); plot(xs,ys,':r'); you re-plot charts, here: as for data extracted at 3-Dec-2010 12:35, where by blue line it has been marked the current available price history of the stock (please ignore a grey line as it refers to the future prices which we of course don’t know on 3-Dec-2010 12:35 but we use it here to understand better the data feeding process in time, used next in GARCH modeling), and what presents the price history with time axis adjusted to start at 0 when a trade has been entered (black filled marker) as suggested by the model 5 min earlier (green marker). Assuming that in our trading we have a fixed stop-loss level of -0.5% per every single trade, the red dotted line in the figure marks the corresponding stock price of suggested exit action. As for 12:35, the actual trading price of TYO:7203 is still above the stop-loss. Great! Well, are you sure? As we trade the stock since 11:00 every 5 min we recalculate GARCH model based on the total data available being updated by a new innovation in price series. Therefore, as for 12:35, our GARCH modeling in Matlab, 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 % ---ANALYSIS % GARCH(p,q) parameter estimation model = garch(3,3) % define model [fit,VarCov,LogL,Par] = estimate(model,retG) % extract model parameters parC=Par.X(1) % omega parG=Par.X(2) % beta (GARCH) parA=Par.X(3) % alpha (ARCH) % estimate unconditional volatility gamma=1-parA-parG VL=parC/gamma; volL=sqrt(VL) % redefine model with estimatated parameters model=garch('Constant',parC,'GARCH',parG,'ARCH',parA) % infer 5-min varinace based on GARCH model sig2=infer(model,retG); % vector runs, model = GARCH(3,3) Conditional Variance Model: -------------------------------------- Distribution: Name = 'Gaussian' P: 3 Q: 3 Constant: NaN GARCH: {NaN NaN NaN} at Lags [1 2 3] ARCH: {NaN NaN NaN} at Lags [1 2 3] ____________________________________________________________ Diagnostic Information Number of variables: 7 Functions Objective: @(X)OBJ.nLogLikeGaussian(X,V,E,Lags,...) Gradient: finite-differencing Hessian: finite-differencing (or Quasi-Newton) Constraints Nonlinear constraints: do not exist Number of linear inequality constraints: 1 Number of linear equality constraints: 0 Number of lower bound constraints: 7 Number of upper bound constraints: 7 Algorithm selected sequential quadratic programming ____________________________________________________________ End diagnostic information Norm of First-order Iter F-count f(x) Feasibility Steplength step optimality 0 8 -2.545544e+04 0.000e+00 1.392e+09 1 60 -2.545544e+04 0.000e+00 8.812e-09 8.813e-07 1.000e+02 Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the selected value of the function tolerance, and constraints are satisfied to within the selected value of the constraint tolerance. and returns the best estimates of GARCH model’s parameters:  GARCH(1,1) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 2.39895e-07 6.87508e-08 3.48934 GARCH{1} 0.9 0.0224136 40.1542 ARCH{1} 0.05 0.0037863 13.2055 fit = GARCH(1,1) Conditional Variance Model: -------------------------------------- Distribution: Name = 'Gaussian' P: 1 Q: 1 Constant: 2.39895e-07 GARCH: {0.9} at Lags [1] ARCH: {0.05} at Lags [1] defining the GARCH(1,1) model on 3-Dec-2010 12:35 of TYO:7302 as follows: $$\sigma_t^2 = 2.39895\times 10^{-7} + 0.05r_{t-1}^2 + 0.9\sigma_{t-1}^2 \ \ .$$ where the long-run volatility$V=0.0022$($0.2\%$) at$\gamma=0.05$. Please note, that due to some computational mind tricks, we allowed to define the raw template of GARCH model as GARCH(3,3) which has been adjusted by Matlab’s solver down to resulting GARCH(1,1). Based on the model and data available till$t-1=$12:35, we get a forecast of$\sigma_t$value at$t=$12:40 to be 92 93 94 % forcast varianace 1-step (5-min) ahead, sigma^2_t sig2_t=forecast(model,1,'Y0',retG,'V0',sig2); % scalar sig_t=sqrt(sig2_t) % scalar $$\sigma_t=0.002 \ (0.2\%) \ .$$ Plotting return series for our trade, at 12:35 we have: 96 97 98 99 100 101 102 103 % update a plot of 5-min returns figure(3); x=1:length(retG); x=x-idx; x=x*5; plot(x,100*(exp(retG)-1),'ko-'); xlim([-20 110]); ylim([-0.8 0.8]); xlabel('Minutes since 03-Dec-2010 11:00'); ylabel('R_t [%]'); Estimating the mean value of return series by taking into account last 12 data points (60 min), 105 106 % estimate mean return over passing hour (dt*12=60 min) mu=mean(retG(end-12:end)) we get$\mu=-2.324\times 10^{-4}$(log values), what allows us to define the model for return at$t=$12:40 to be: $$R_t = \mu + \sigma_t\epsilon_t = -0.02324 + 0.2\epsilon \ , \ \ \ \epsilon\sim N(0,1)\ .$$ The beauty of God’s fingerprint denoted by$\epsilon$we can understand better but running a simple Monte-Carlo simulation and drawing 1000 rvs of$r_t=\mu+\sigma_t\epsilon$which illustrate 1000 possible values of stock return as predicted by GARCH model! 108 109 % simulate 1000 rvs ~ N(0,sig_t) r_t=sig_t*randn(1000,1); One of them could be the one at 12:40 but we don’t know which one?! However, as discussed in Section 2 of this article, we find 5% quantile of$Q$from the simulation and we mark this value by red filled circle in the plot. 111 112 113 114 115 116 117 tmp=[(length(retG)-idx)*5+5*ones(length(r_t),1) 100*(exp(r_t)-1)]; hold on; plot(tmp(:,1),tmp(:,2),'x'); Q=norminv(0.05,mu,sig_t); Q=100*(exp(Q)-1); hold on; plot((length(retG)-idx)*5+5,Q,'or','MarkerFaceColor','r'); Including 1000 possibilities (blue markers) of realisation of$r_t$, the updated Figure 4 now looks like: where we find from simulation$Q=-0.3447\%$. The current P&L for an open long position in TYO:7203, 119 120 121 % current P/L for the trade P_open=3315; % JPY (3-Dec-2010 11:00) cumRT=100*(cpG(end,2)/P_open-1) % = r' returns$r’=-0.3017\%$. Employing a proposed test statistic of$d$, 123 124 % test statistics (percent) d=cumRT+Q we get$d=r’+Q = -0.6464\%$which, for the first time since the opening the position at 11:00, exceeds our stop-loss of$s=-0.5\%$. Therefore, based on GARCH(1,1) forecast and logical condition of$d\lt s$to be true, our risk management engine sends out at 12:35 an order to close the position at 12:40. The forecasted closing price, 126 127 % forecasted stock price under test (in JPY) f=P_open*(1+d/100) is estimated to be equal 3293.6 JPY, i.e. -0.6464% loss. At 12:40 the order is executed at 3305 JPY per share, i.e. with realized loss of -0.3017%. The very last question you asks yourself is: using above GARCH Exit Strategy was luck in my favor or not? In other words, has the algorithm taken the right decision or not? We can find out about that only letting a time to flow. As you come back from the super-quick lunch break, you sit down, look at the recent chart of TYO:7203 at 12:55, and you feel relief, as the price went further down making the algo’s decision a right decision. For reference, the red filled marker has been used to denote the price at which we closed our long position with a loss of -0.3017% at 12:40, and by open red circle the forecasted price under$d$-statistics as derived at 12:35 has been marked in addition for the completeness of the grand picture and idea standing behind this post. ## Black Swan and Extreme Loss Modeling When I read the book of Nassim Nicholas Taleb Black Swan my mind was captured by the beauty of extremely rare events and, concurrently, devastated by the message the book sent: the non-computability of the probability of the consequential rare events using scientific methods (as owed to the very nature of small probabilities). I rushed to the local library to find out what had been written on the subject. Surprisingly I discovered the book of Embrechts, Kluppelberg & Mikosh on Modelling Extremal Events for Insurance and Finance which appeared to me very inaccessible, loaded with heavy mathematical theorems and proofs, and a negligible number of practical examples. I left it on the shelf to dust for a longer while until last week when a fresh drive to decompose the problem came back to me again. In this article I will try to take you for a short but efficient journey through the part of the classical extreme value theory, namely, fluctuations of maxima, and fulfil the story with easy-to-follow procedures on how one may run simulations of the occurrences of extreme rare losses in financial return series. Having this experience, I will discuss briefly the inclusion of resulting modeling to the future stock returns. 1. The Theory of Fluctuations of Maxima Let’s imagine we have a rich historical data (time-series) of returns for a specific financial asset or portfolio of assets. A good and easy example is the daily rate of returns,$R_i$, for a stock traded e.g. at NASDAQ Stock Market, $$R_t = \frac{P_t}{P_{t-1}} – 1 \ ,$$ where$P_t$and$P_{t-1}$denote a stock price on a day$t$and$t-1$, respectively. The longer time coverage the more valuable information can be extracted. Given the time-series of daily stock returns,$\{R_i\}\ (i=1,…,N)$, we can create a histogram, i.e. the distribution of returns. By the rare event or, more precisely here, the rare loss we will refer to the returns placed in the far left tail of the distribution. As an assumption we also agree to$R_1,R_2,…$to be the sequence of iid non-degenerate rvs (random variables) with a common df$F$(distribution function of$F$). We define the fluctuations of the sample maxima as: $$M_1 = R_1, \ \ \ M_n = \max(R_1,…,R_n) \ \mbox{for}\ \ n\ge 2 \ .$$ That simply says that for any time-series$\{R_i\}$, there is one maximum corresponding to the rv (random variable) with the most extreme value. Since the main line of this post is the investigation of maximum losses in return time-series, we are eligible to think about negative value (losses) in terms of maxima (therefore conduct the theoretical understanding) thanks to the identity: $$\min(R_1,…,R_n) = -\max(-R_1,…,-R_n) \ .$$ The distribution function of maximum$M_n$is given as: $$P(M_n\le x) = P(R_1\le x, …, R_n\le x) = P(R_1\le x)\cdots P(R_n\le x) = F^n(x)$$ for$x\in\Re$and$n\in\mbox{N}$. What the extreme value theory first ‘investigates’ are the limit laws for the maxima$M_n$. The important question here emerges: is there somewhere out there any distribution which satisfies for all$n\ge 2$the identity in law $$\max(R_1,…,R_n) = c_nR + d_n$$ for appropriate constants$c_n>0$and$d_n\in\Re$, or simply speaking, which classes of distributions$F$are closed for maxima? The theory defines next the max-stable distribution within which a random variable$R$is called max-stable if it satisfies a aforegoing relation for idd$R_1,…,R_n$. If we assume that$\{R_i\}$is the sequence of idd max-stable rvs then: $$R = c_n^{-1}(M_n-d_n)$$ and one can say that every max-stable distribution is a limit distribution for maxima of idd rvs. That brings us to the fundamental Fisher-Trippett theorem saying that if there exist constants$c_n>0$and$d_n\in\Re$such that: $$c_n^{-1}(M_n-d_n) \rightarrow H, \ \ n\rightarrow\infty\ ,$$ then$H$must be of the type of one of the three so-called standard extreme value distributions, namely: Fréchet, Weibull, and Gumbel. In this post we will be only considering the Gumbel distribution$G$of the corresponding probability density function (pdf)$g$given as: $$G(z;\ a,b) = e^{-e^{-z}} \ \ \mbox{for}\ \ z=\frac{x-b}{a}, \ x\in\Re$$ and $$g(z;\ a,b) = b^{-1} e^{-z}e^{-e^{-z}} \ .$$ where$a$and$b$are the location parameter and scale parameter, respectively. Having defined the extreme value distribution and being now equipped with a better understanding of theory, we are ready for a test drive over daily roads of profits and losses in the trading markets. This is the moment which separates men from boys. 2. Gumbel Extreme Value Distribution for S&P500 Universe As usual, we start with entrée. Our goal is to find the empirical distribution of maxima (i.e. maximal daily losses) for all stocks belonging to the S&P500 universe between 3-Jan-1984 and 8-Mar-2011. There were$K=954$stocks traded within this period and their data can be downloaded here as a sp500u.zip file (23.8 MB). The full list of stocks’ names is provided in sp500u.lst file. Therefore, performing the data processing in Matlab, first we need to compute a vector storing daily returns for each stock, and next find the corresponding minimal value$M_n$where$n$stands for the length of each return vector: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 % Black Swan and Extreme Loss Modeling % using Gumbel distribution and S&P500 universe % % (c) 2013 QuantAtRisk, by Pawel Lachowicz clear all; close all; clc; tic; %% DATA READING AND PREPROCESSING % read a list of stock names StockNames=dataread('file',['sp500u.lst'],'%s','delimiter', '\n'); K=length(StockNames); % the number of stocks in the universe % path to data files path=['./SP500u']; fprintf('data reading and preprocessing..\n'); for si=1:K % --stock name stock=StockNames{si}; fprintf('%4.0f %7s\n',si,stock); % --load data n=[path,'/',stock,'.dat']; % check for NULL and change to NaN (using 'sed' command % in Unix/Linux/MacOS environment) cmd=['sed -i ''s/NULL/NaN/g''',' ',n]; [status,result]=system(cmd); % construct FTS object for daily data FTS=ascii2fts(n,1,2); % fill any missing values denoted by NaNs FTS=fillts(FTS); % extract the close price of the stock cp=fts2mat(FTS.CLOSE,0); % calculate a vector with daily stock returns and store it in % the cell array R{si}=cp(2:end)./cp(1:end-1)-1; end %% ANALYSIS % find the minimum daily return value for each stock Rmin=[]; for si=1:K Mn=min(R{si},[],1); Rmin=[Rmin; Mn]; end Having that ready, we fit the data with the Gumbel function which (as we believe) would describe the distribution of maximal losses in the S&P500 universe best: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 % fit the empirical distribution with Gumbel distribution and % estimate the location, a, and scale, b, parameter [par,parci]=evfit(Rmin); a=par(1); b=par(2); % plot the distribution x=-1:0.01:1; hist(Rmin,-1:0.01:0); h=findobj(gca,'Type','patch'); set(h,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.6 0.6 0.6]); h=findobj(gca,'Type','box'); set(h,'Color','k'); % add a plot of Gumbel pdf pdf1=evpdf(x,a,b); y=0.01*length(Rmin)*pdf1; line(x,y,'color','r'); box on; xlabel('R_{min}'); ylabel(['f(R_{min}|a,b)']); text(-1,140,['a = ',num2str(paramEstsMinima(1),3)]); text(-1,130,['b = ',num2str(paramEstsMinima(2),3)]); xlim([-1 0]); The maximum likelihood estimates of the parameters$a$and$b$and corresponding 95% confidence intervals we can find as follows: >> [par,parci]=evfit(Rmin) par = -0.2265 0.1135 parci = -0.2340 0.1076 -0.2190 0.1197 That brings us to a visual representation of our analysis: This is a very important result communicating that the expected value of extreme daily loss is equal about -22.6%. However, the left tail of the fitted Gumbel distribution extends far up to nearly -98% although the probability of the occurrence of such a massive daily loss is rather low. On the other hand, the expected value of -22.6% is surprisingly close to the trading down-movements in the markets on Oct 19, 1987 known as Black Monday when Dow Jones Industrial Average (DJIA) dropped by 508 points to 1738.74, i.e. by 22.61%! 3. Blending Extreme Loss Model with Daily Returns of a Stock Probably you wonder how can we include the results coming from the Gumbel modeling for the prediction of rare losses in the future daily returns of a particular stock. This can be pretty straightforwardly done combining the best fitted model (pdf) for extreme losses with stock’s pdf. To do it properly we need to employ the concept of the mixture distributions. Michael B. Miller in his book Mathematics and Statistics for Financial Risk Management provides with a clear idea of this procedure. In our case, the mixture density function$f(x)$could be denoted as: $$f(x) = w_1 g(x) + (1-w_1) n(x)$$ where$g(x)$is the Gumbel pdf,$n(x)$represents fitted stock pdf, and$w_1$marks the weight (influence) of$g(x)$into resulting overall pdf. In order to illustrate this process, let’s select one stock from our S&P500 universe, say Apple Inc. (NASDAQ: AAPL), and fit its daily returns with a normal distribution: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 % AAPL daily returns (3-Jan-1984 to 11-Mar-2011) rs=R{18}; figure(2); hist(rs,50); h=findobj(gca,'Type','patch'); set(h,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.6 0.6 0.6]); h=findobj(gca,'Type','box'); set(h,'Color','k'); % fit the normal distribution and plot the fit [muhat,sigmahat]=normfit(rs) x =-1:0.01:1; pdf2=normpdf(x,muhat,sigmahat); y=0.01*length(rs)*pdf2; hold on; line(x,y,'color','r'); xlim([-0.2 0.2]); ylim([0 2100]); xlabel('R'); where the red line represents the fit with a mean of$\mu=0.0012$and a standard deviation$\sigma=0.0308$. We can obtain the mixture distribution$f(x)$executing a few more lines of code: 89 90 91 92 93 94 95 96 % Mixture Distribution Plot figure(3); w1= % enter your favorite value, e.g. 0.001 w2=1-w1; pdfmix=w1*(pdf1*0.01)+w2*(pdf2*0.01); % note: sum(pdfmix)=1 as expected x=-1:0.01:1; plot(x,pdfmix); xlim([-0.6 0.6]); It is important to note that our modeling is based on$w_1$parameter. It can be intuitively understood as follows. Let’s say that we choose$w_1=0.01$. That would mean that Gumbel pdf contributes to the overall pdf in 1%. In the following section we will see that if a random variable is drawn from the distribution given by$f(x)$,$w_1=0.01$simply means (not exactly but with a sufficient approximation) that there is 99% of chances of drawing this variable from$n(x)$and only 1% from$g(x)$. The dependence of$f(x)$on$w_1$illustrates the next figure: It is well visible that a selection of$w_1>0.01$would be a significant contributor to the left tail making it fat. This is not the case what is observed in the empirical distribution of daily returns for AAPL (and in general for majority of stocks), therefore we rather expect$w_1$to be much much smaller than 1%. 4. Drawing Random Variables from Mixture Distribution A short break between entrée and main course we fill with a sip of red wine. Having discrete form of$f(x)$we would like to be able to draw a random variable from this distribution. Again, this is easy too. Following a general recipe, for instance given in the Chapter 12.2.2 of Philippe Jorion’s book Value at Risk: The New Benchmark for Managing Financial Risk, we wish to use the concept of inverse transform method. In first step we use the output (a random variable) coming from a pseudo-random generator drawing its rvs based on the uniform distribution$U(x)$. This rv is always between 0 and 1, and in the last step is projected on the cumulative distribution of our interest$F(x)$, what in our case would correspond to the cumulative distribution for$f(x)$pdf. Finally, we read out the corresponding value on the x-axis: a rv drawn from$f(x)$pdf. Philippe illustrates that procedure more intuitively: This methods works smoothly when we know the analytical form of$F(x)$. However, if this not in the menu, we need to use a couple of technical skills. First, we calculate$F(x)$based on$f(x)$. Next, we set a very fine grid for$x$domain, and we perform interpolation between given data points of$F(x)$. 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 % find cumulative pdf, F(x) figure(4); s=0; x=-1; F=[]; for i=1:length(pdfmix); s=s+pdfmix(i); F=[F; x s]; x=x+0.01; end plot(F(:,1),F(:,2),'k') xlim([-1 1]); ylim([-0.1 1.1]); % perform interpolation of cumulative pdf using very fine grid xi=(-1:0.0000001:1); yi=interp1(F(:,1),F(:,2),xi,'linear'); % use linear interpolation method hold on; plot(xi,yi); The second sort of difficulty is in finding a good match between the rv drawn from the uniform distribution and approximated value for our$F(x)$. That is why a very fine grid is required supplemented with some matching techniques. The following code that I wrote deals with this problem pretty efficiently: 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 % draw a random variable from f(x) pdf: xi(row) tF2=round((round(100000*yi')/100000)*100000); RV=[]; for k=1:(252*40) notok=false; while(~notok) tU=round((round(100000*rand)/100000)*100000); [r,c,v]=find(tF2==tU); if(~isempty(r)) notok=true; end end if(length(r)>1) rv=round(2+(length(r)-2)*rand); row=r(rv); else row=r(1); end % therefore, xi(row) is a number represting a rv % drawn from f(x) pdf; we store 252*40 of those % new rvs in the following matrix: RV=[RV; xi(row) yi(row)]; end % mark all corresponding rvs on the cumulative pdf hold on; plot(RV(:,1),RV(:,2),'rx'); Finally, as the main course we get and verify the distribution of a large number of new rvs drawn from$f(x)$pdf. It is crucial to check whether our generating algorithm provides us with a uniform coverage across the entire$F(x)$plot, where, in order to get more reliable (statistically) results, we generate 10080 rvs which correspond to the simulated 1-day stock returns for 252 trading days times 40 years. 5. Black Swan Detection A -22% collapse in the markets on Oct 19, 1978 served as a day when the name of Black Swan event took its birth or at least had been reinforced in the financial community. Are black swans extremely rare? It depends. If you live for example in Perth, Western Australia, you can see a lot of them wandering around. So what defines the extremely rare loss in the sense of financial event? Let’s assume by the definition that by Black Swan event we will understand of a daily loss of 20% or more. If so, using the procedure described in this post, we are tempted to pass from the main course to dessert. Our modeling concentrates around finding the most proper contribution of$w_1g(x)$to resulting$f(x)$pdf. As an outcome of a few runs of Monte Carlo simulations with different values of$w_1$we find that for$w_1=[0.0010,0.0005,0.0001]$we detect in the simulations respectively 9, 5, and 2 events (rvs) displaying a one-day loss of 20% or more. Therefore, the simulated daily returns for AAPL, assuming$w_1=0.0001$, generate in the distribution two Black Swan events, i.e. one event per 5040 trading days, or one per 20 years: That result agrees quite well with what has been observed so far, i.e. including Black Monday in 1978 and Flash Crash in intra-day trading on May 6, 2010 for some of the stocks. Acknowledgements I am grateful to Peter Urbani from New Zealand for directing my attention towards Gumbel distribution for modeling very rare events. ## VaR and Expected Shortfall vs. Black Swan It is one of the most fundamental approaches in measuring the risk, but truly worth revising its calculation. Value-at-Risk (VaR). A magical number quoted at the end of the day by the banks’ financial risk managers, portfolios’ risk managers, and risk managers simply concerned about the expected risk threshold not to be, hopefully, exceeded on the next day. What is so magical about it? According to definition, given a specific time horizon and taking into account last$N$days of our trading activity, we can always calculate a number which would provide us with a simple answer to the question: if something really unexpected happened on the next day, what would be the loss margin we could for sure suffer? Well, welcome to the world of VaR! More formally, given a daily (or, weekly, monthly, etc.) distribution of returns (of, for example, a single stock, a portfolio of assets, etc.), we are interested in finding the value of$VaR_\alpha$of, say,$\alpha=0.05$(five percent) which would say to as that there is 5% of chances that the value of$VaR_{0.05}$would be exceeded in trading on the next day. This value is located in the left tail of the distribution and, by convention, it is given as positive number. Therefore, we define$VaR_\alpha$as: $$P[X\le VaR_\alpha] = 1 – \alpha$$ where$X$is a random variable and can, as in our assumed case, represent a 1-day return on a traded asset or a portfolio of assets. Beautiful! We have the risk threshold provided as a number. But in practice, how should we calculate$VaR_\alpha$? Here come into spotlight some of the shortcomings of VaR in general. VaR depends on the time-frame of the daily returns we examine: therefore$VaR_\alpha$as a number will be different if you include last 252 trading days in estimation comparing to last 504 trading days (2 years). Intuitively, the more data (more days, years, etc.) we have the better estimation of$VaR_\alpha$, right? Yes and no. Keep in mind that the dynamic of the trading market changes continuously. In early 80s there was no high-frequency trading (HFT). It dominated trading traffic after the year of 2000 and now HFT plays a key role (an influential factor) in making impact on prices. So, it would be a bit unfair to compare$VaR_\alpha$estimated in 80s with what is going on right now.$VaR_\alpha$assumes that the distribution of returns is normal. Not most beautiful assumption but the most straightforward to compute$VaR_\alpha$. Should we use$VaR_\alpha$? Yes, but with caution. Okay then, having$VaR_\alpha$calculated we know how far the loss could reach at the$1-\alpha$confidence level. The next super important question in risk management every risk manager should ask or at least be interested in reporting is: if$VaR_\alpha$event occurs on the next day, what is the expected loss we might expect to suffer (say, in dollars)?$VaR_\alpha$is the threshold. Now we are interested in the expected value of loss given the exceedance. It is the definition of expected shortfall and is based on the concept of conditional probability as follows: $$E[X\ |\ X>VaR_\alpha] = ES \ .$$ In general, if our daily distribution of returns can be described by a function$f(x)$which would represent a power density function (pdf), then: $$ES = \frac{1}{\alpha} \int_{-\infty}^{VaR_\alpha} xf(x)dx \ .$$ Given any data set of returns,$R_t\ (t=1,…,N)$, calculated from our price history, $$R_t = \frac{P_{t+1}}{P_t} -1$$ both numbers,$VaR_\alpha$and$ES$can be, in fact, calculated in two ways. The first method is based on the empirical distribution, i.e. using data as given: $$VaR_\alpha = h_i^{VaR} \ \ \mbox{for}\ \ \sum_{i=1}^{M-1} H_i(h_{i+1}-h_i) \le \alpha$$ where$H$represents the normalized histogram of$R_t$(i.e., its integral is equal 1) and$M$is the number of histograms bins. Similarly for$ES$, we get: $$ES = \sum_{i=1}^{h_i^{VaR}} h_iH_i(h_{i+1}-h_i) \ .$$ The second method would be based on integrations given the best fit to the histogram of$R_t$using$f(x)$being a normal distribution. As we will see in the practical example below, both approaches returns different values, and an extra caution should be undertaken while reporting the final risk measures. Case Study: IBM Daily Returns in 1987 The year of 1987 came into history as the most splendid time in the history of stock trading. Why? Simply because of so-called Black Monday on Oct 19, where markets denoted over 20% losses in a single trading day. Let’s analyse the case of daily returns and their distribution for IBM stock, as analysed by a risk manager after the close of the market on Thu, Dec 31, 1978. The risk manager is interested in the estimation of 1-day 95% VaR threshold, i.e.$VaR_0.05$, and the expected shortfall if on the next trading day (Jan 4, 1988) the exceedance event would occur. Therefore, let’s assume that our portfolio is composed solely of the IBM stock and we have the allocation of the capital of$C$dollars in it as for Dec 31, 1987. Using a record of historical trading, IBM.dat, of the form: IBM DATES OPEN HIGH LOW CLOSE VOLUME 03-Jan-1984 0.000000 0.000000 0.000000 30.437500 0.000000 04-Jan-1984 0.000000 0.000000 0.000000 30.968750 0.000000 05-Jan-1984 0.000000 0.000000 0.000000 31.062500 0.000000 06-Jan-1984 0.000000 0.000000 0.000000 30.875000 0.000000 ... 04-Mar-2011 0.000000 0.000000 0.000000 161.830000 0.000000 07-Mar-2011 0.000000 0.000000 0.000000 159.930000 0.000000 containing the close price of IBM stock, in Matlab, we extract the data for the year of 1987, then we construct a vector of daily returns$R_t$on the stock, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 % Reading in the data fn=['IBM.dat']; ftsD=ascii2fts(fn,1,2); range=[datenum('02-Jan-1987'),datenum('31-Dec-1987')]; drr=[datestr(range(1)),'::',datestr(range(2))]; fts=ftsD(drr); data=fts2mat(fts,1); % R_t vector rdata=data(2:end,5)./data(1:end-1,5)-1; % Plotting figure(1); subplot(2,1,1); plot(data(:,1)-datenum('2-Jan-1987'),data(:,5)); xlim([0 365]); ylabel('IBM Close Price ($)'); subplot(2,1,2); plot(data(2:end,1)-datenum('2-Jan-1987'),rdata); xlim([0 365]); ylabel('R_t') xlabel('t (days)')

and present the data graphically:

Having our data, let’s find $VaR_{0.05}$ and corresponding $ES$,

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 % Construct the normalized histogram [H,h]=hist(rdata,100); hist(rdata,100); sum=0; for i=1:length(H)-1 sum=sum+(H(i)*(h(i+1)-h(i))); end figure(2) H=H/sum; bar(h,H,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.7 0.7 0.7])   % Calculate VaR 95 based on empirical distribution sum=0; i=1; while (i<length(H)) sum=sum+(H(i)*(h(i+1)-h(i))); if(sum>=0.05) break; end i=i+1; end VaR=h(i) hold on; plot(h(i),0,'ro'); % mark VaR in the plot   % Fit the normal distribution to R_t data [muhat,sigmahat] = normfit([rdata]'); Y = normpdf(h,muhat(1),sigmahat(1)); hold on; plot(h,Y,'b-'); % and plot the fit using blue line   % Find for the fitted N(muhat,sigmahat) VaR 95% sum=0; i=1; while (i<length(h)) sum=sum+(Y(i)*(h(i+1)-h(i))); if(sum>=0.05) break; end i=i+1; end VaR_fit=h(i) hold on; plot(h(i),0,'ro','markerfacecolor','r'); % mark VaR_fit ivar=i;   % Calculate the Expected Shortfall % based on fitted normal distribution sum=0; i=1; while (i<=ivar) sum=sum+(h(i)*Y(i)*(h(i+1)-h(i))); i=i+1; end ES_fit=sum/0.05 hold on; plot(ES_fit,0,'ko','markerfacecolor','k');   % Add numbers to the plot text(-0.23,25,['1-Day VaR and Expected Shortfall']); text(-0.23,23.5,['as for 31/12/1987']); text(-0.23,20,['VaR = ',num2str(VaR*-1,3),' (',num2str(VaR*-100,2),'%)']); text(-0.23,18,['ES = ',num2str(ES_fit*-1,3),' (',num2str(ES_fit*-100,2),'%)']); text(-0.23,13,['\mu = ',num2str(muhat,2)]); text(-0.23,11,['\sigma = ',num2str(sigmahat,2)]); xlabel('R_t');

marking all numbers in resulting common plot:

In the plot, filled red and black markers correspond to $VaR_{0.05}$ and $ES$, respectively, and are calculated using the fitted normal distribution. Both values have been displayed in the plot as well. In addition to that, the plot contains a red open circle marker denoting the value of $VaR_{0.05}$ obtained from the empirical histogram discrete integration. Please note how the choice of the method influences the value of VaR.

The derived numbers tell the risk manager that in 95% of cases one should feel confident that the value of $VaR_{0.05}$ will not be exceed in trading of IBM on Jan 4, 1988 (next trading day). $C\times VaR_{0.05} = 0.028C$ would provide us with the loss amount (in dollars) if such undesired event had occurred. And if it was the case indeed, the expected loss (in dollars) we potentially would suffer, would be $C\times ES = 0.058C$.

Now it easy to understand why the Black Swan event of Oct 19, 1987 falls into the category of super least likely moves in the markets. It places itself in the far left tail of both empirical and fitted distributions denoting an impressive loss of 23.52% for IBM on that day. If we assume that the daily price changes are normally distributed then probability of a drop of at least 22 standard deviations is
$$P(Z\le -22) = \Phi(-22) = 1.4 \times 10^{-107} \ .$$ This is astronomically rare event!

They say that a lightening does not strike two times, but if a 1-day drop in the market of the same magnitude had occurred again on Jan 4, 1988, our long position in IBM would cause us to suffer not the loss of $0.058C$ dollars but approximately $0.23C$. Ahhhuc!

## Coskewness and Cokurtosis Computation for Portfolio Managers

Suppose you are responsible for the management of a current company’s portfolio $P$ consisting of $N$ securities (e.g. equities) traded actively at NASDAQ. As a portfolio manager you probably are interested in portfolio performance, new asset allocation, and risk control. Therefore, your basic set of statistics should include portfolio expected return, $\mu_P$, and portfolio standard deviation, $\sigma_P$, as two key numbers you want to calculate every time you evaluate your portfolio performance.

Coskewness, $s_P$, and cokurtosis, $k_P$, are two additional multivariate higher moments (co-moments) important in asset allocation process and portfolio management. Similarly to the fundamental meaning of a sample’s skewness and kurtosis, the co-skewness and co-kurtosis provides a portfolio manager with an ability to test the same portfolio under different composition in order to facilitate changes required to be introduced (e.g. an elimination of some allocations in badly performing securities causing more pronounced negative coskewness).

In this post I will provide with the basic matrix algebra covering the computational solution for four portfolio statistics with a Matlab code example as a practical implementation of mathematics in action.

Expected Return on Portfolio

Let $P$ is our portfolio containing $N$ securities $K_i$ such that $P=\{K_1,K_2,…,K_N\}$ and $i=1,…,N$. Each security has a trading history for last T days (or months, or years) and we are looking for its $T-1$ rates of return $r_t$ for $t=1,…,T-1$. Therefore, let us denote by $K_i$ a column vector of dimensions $(T-1,1)$ such that:
$$K_i = [r_1,r_2,…,r_{T-1}]^T\ ,$$
and portfolio matrix, $P$, containing all securities as:
$$P = [K_1\ |\ K_2\ |\ …\ |\ K_N ] = \left[ \begin{array}{cccc} r_{1,1} & r_{1,2} & … & r_{1,N} \\ r_{2,1} & r_{2,2} & … & r_{2,N} \\ … & … & … & … \\ r_{T-1,1} & r_{T-1,2} & … & r_{T-1,N} \end{array} \right] \ .$$ Our portfolio can be described in term of their weights:
$$w_i = \frac{y_iS_i(0)}{V(0)}$$ where numerator describes the product of the number of shares and stock purchase price in ratio to the amount initially invested in the portfolio. Rewriting it into one-row matrix we have:
$$w = [\ w_1\ \ w_2\ \ … \ \ w_N\ ]$$ where from obvious reason the sum of individual weights is one or $1=uw^T$. For each security we can calculate its expected return $\mu_i=E(K_i)$ and form a vector:
$$m = [\ \mu_1\ \ \mu_2\ \ … \ \ \mu_N\ ] \ .$$
Finally, the expected return on portfolio $P$ is given as:
$$\mu_P = mw^T \$$ what follows:
$$\mu_P = E(K_i) = E\left(\sum_{i=1}^{N} w_iK_i \right) = \sum_{i=1}^{N} w_i\mu_i = mw^T .$$

Portfolio Variance

By evaluation of variance, we are able to show that the expected portfolio variance can be expressed as:
$$\sigma_P = w M_2 w^T$$
where $M_2$ is the covariance matrix $(N,N)$ with the individual covariances denoted by $c_{ij}=Cov(K_i,K_j)$. At this point, each element of $M_2$ could be calculated as $c_{ij}=\rho_{ij}\sigma_i\sigma_j$, i.e. with a knowledge of coefficient of correlation, $\rho_{ij}$, between the securities $i$ and $j$, and corresponding standard deviations calculated from the history provided by $P$ matrix.

Just as an educational remark, let me remind you that elements $c_{ij}$ can be obtained in the following way:
$$c_{ij} = \frac{1}{[(T-1)-1)]} \sum_{i,j=1}^{N} \sum_{t=1}^{T-1} (K_{t,i}-\mu_i)(K_{t,j}-\mu_j) \ ,$$ therefore
$$M_2 = \left[ \begin{array}{cccc} c_{11} & c_{12} & … & c_{1N} \\ c_{21} & c_{22} & … & c_{2N} \\ … & … & … & … \\ c_{N1} & c_{N2} & … & c_{NN} \end{array} \right] .$$

Portfolio Coskewness

It can calculated as a product of:
$$s_P = w M_3 (w^T\otimes w^T)$$
where the symbol of $\otimes$ denotes the Kronecker product between $w$ and $w$, and
$$M_3 = E[(r-\mu)(r-\mu)^T\otimes(r-\mu)^T] = \{a_{ijk}\} \ , \\ a_{ijk} = E[(r_i-\mu_i)(r_j-\mu_j)(r_k-\mu_k)] \ \ \mbox{for}\ \ i,j,k=1,…,N.$$
The co-skewness matrix of $M_3$ of dimensions $(N,N^2)$ can be represented by $N$ $A_{ijk}$ matrixes $(N,N)$ such that:
$$M_3 = [\ A_{1jk}\ A_{1jk}\ …\ A_{Njk}\ ]$$
where $j,k=1,…,N$ as well as for an index $i$. The individual elements of the matrix can be obtained as:
$$a_{ijk} = \frac{1}{T-1} \sum_{i,j,k=1}^{N} \sum_{t=1}^{T-1} (K_{t,i}-\mu_i)(K_{t,j}-\mu_j)(K_{t,k}-\mu_k) \ .$$

Portfolio Cokurtosis

Cokurtosis can be obtained by calculation of the following products:
$$k_P = w M_4 (w^T\otimes w^T\otimes w^T)$$ where
$$M_4 = E[(r-\mu)(r-\mu)^T\otimes(r-\mu)^T\otimes(r-\mu)^T] = \{b_{ijkl}\} \ , \\ b_{ijkl} = E[(r_i-\mu_i)(r_j-\mu_j)(r_k-\mu_k)(r_l-\mu_l)] \ \ \mbox{for}\ \ i,j,k,l=1,…,N.$$
The co-kurtosis matrix $M_4$ of dimensions $(N,N^3)$ can be represented by $N$ $B_{ijkl}$ matrixes $(N,N)$ such that:
$$M_4 = [\ B_{11kl}\ B_{12kl}\ …\ B_{1Nkl}\ |\ B_{21kl}\ B_{22kl}\ …\ B_{2Nkl}\ |\ …\ |\ B_{N1kl}\ B_{N2kl}\ …\ B_{NNkl}\ ]$$
where $k,l=1,…,N$. The individual elements of the matrix can be obtained as:
$$b_{ijkl} = \frac{1}{T-1} \sum_{i,j,k,l=1}^{N} \sum_{t=1}^{T-1} (K_{t,i}-\mu_i)(K_{t,j}-\mu_j)(K_{t,k}-\mu_k)(K_{t,l}-\mu_l) \ .$$

Example

Below I present a simple Matlab code which computes discussed above portfolio key four portfolio statistics based on $N=3$ securities traded within given $P$. The choice of portfolio content is not solely limited to equities, and $P$ can contain any asset class under management, or even their mixture. Since the expected portfolio return and risk, $\mu_P$ and $\sigma_P$, respectively, are assumed to be expressed as percent, their transformation to a physical unit of money can be straightforwardly obtained.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 % Program demonstrates the calculattion of portfolio four key statistics: % the expected portfolio return (muP), portfolio standard deviation (sigP), % portfolio co-skewness (sP), and portfolio co-kurtosis (kP). % % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz % % Assumptions: * portfolio comprises 252 trading days, i.e. T=252-1 values % of daily returns % * daily returns are simulated from the normal distribution % with zero mean and unit standard deviation (in practice % one can use real market data) % * N=3, number of securities in portfolio P % % Remark: the code can be easily modified for the need of any N securities % considered in your portfolio P; it is not difficult to note that % the calcution of all four statistics can be performed by risk % managers who manage the porfolio P of portfolios P_i such that % P=[P_1 P_2 ... P_N]   clear all; clc; close all;   T=252;   % Simulated returns for 3 securities X,Y,Z expressed in [%] X=random('normal',0,1,T-1,1); Y=random('normal',0,1,T-1,1); Z=random('normal',0,1,T-1,1);   % Portfolio matrix P=[X Y Z];   % Portfolio weights for all securities (sum=1) w=[0.2 0.5 0.3];   % m-vector containing the expected returns of securities m=mean([P(:,1) P(:,2) P(:,3)]);   % Computation of M2, the covariance matrix S=[]; for i=1:size(P,2) for j=1:size(P,2) u=0; for t=1:T-1 u=u+((P(t,i)-mean(P(:,i)))*(P(t,j)-mean(P(:,j)))); end S(i,j)=u/((T-1)-1); end end M2=S; %   % Computation of M3, the co-skewness matrix M3=[]; for i=1:size(P,2) S=[]; for j=1:size(P,2) for k=1:size(P,2) u=0; for t=1:T-1 u=u+((P(t,i)-mean(P(:,i)))*(P(t,j)-mean(P(:,j))) ... *(P(t,k)-mean(P(:,k)))); end S(j,k)=u/(T-1); end end M3=[M3 S]; end   % Computation of M4, the co-kurtosis matrix M4=[]; for i=1:size(P,2) for j=1:size(P,2) S=[]; for k=1:size(P,2) for l=1:size(P,2) u=0; for t=1:T-1 u=u+((P(t,i)-mean(P(:,i)))*(P(t,j)-mean(P(:,j)))* ... (P(t,k)-mean(P(:,k)))*(P(t,l)-mean(P(:,l)))); end S(k,l)=u/(T-1); end end M4=[M4 S]; end end   % Results muP=m*w' % expected return on a portfolio [%] stdP=sqrt(w*M2*w') % portfolio standard deviation [%] sK=w*M3*kron(w',w') % portfolio coskewness sP=w*M4*kron(kron(w',w'),w') % portfolio cokurtosis

## Dutch Book: Making a Riskless Profit

If you think there is no way to make a riskless profit, think again. This concept is known as Dutch Book. First, let me remind the definition of the probability stated as odds. Given a probability $P(E)$ of an event $E$, odds for $E$ are equal
$$E=P(E)/[1-P(E)] .$$ Given odds for $E$ of a:b, the implied probability of $E$ is $a/(a+b)$. Reversely speaking, odds against $E$ are $$E=[1-P(E)]/P(E).$$ Thus, given odds against $E$ of a:b, the implied probability of $E$ is $b/(a+b)$.

Now, suppose John places $\$100$bet on$X$at odds of 10:1 against$X$, and later he is able to place a$\$600$ bet against $X$ at odds of 1:1 against $X$. Whatever the outcome of $X$, that person makes a riskless profit equal to $\$400$if$X$occurs or$\$500$ if $X$ does not occur because the implied probabilities are inconsistent.

John is said to have made a Dutch Book in $X$.

## Measure of Risk of any Security in Equilibrium

The standard Capital Asset Pricing Model (CAPM) is an equilibrium model which construction allows us to determine the relevant measure of risk for any asset and the relationship between expected return and risk for asset when market are in equilibrium. Within CAPM the expected return on security $i$ is given by:
$$\bar{R}_i = R_F + \beta_i(\bar{R}_M-R_F)$$ where $\beta_i$ is argued to be the correct measure of a security’s risk in a very well-diversified portfolios. For such portfolios the non-systematic risk tends to go to zero and the reminder – the systematic risk – is measured by $\beta_i$. For the Market Portfolio its Beta is equal one. Therefore, the aforementioned equation defines the security market line.

It is possible to go one step further and write the same CAPM formula as follows:
$$\bar{R}_i = R_F + \left( \frac{\bar{R}_M-R_F}{\sigma_M} \right) \frac{\sigma_{iM}}{\sigma_M}$$ what keeps its linear relationship between the expected return but in $\sigma_{iM}/\sigma_M$ space. Recall that the term in braces is the market price of the risk at riskless rate of interest given by $R_F$. $\sigma_{iM}$ denotes the measure of the risk of security $i$ but if left defined as $\sigma_{iM}/\sigma_M$ is provide us with more intuitive understanding itself as the measure of how the risk on a security affects the risk of the market portfolio.

In other words, $\sigma_{iM}/\sigma_M$ is the measure of risk of any security in equilibrium and, as we will show further below, it is equal:
$$\frac{\sigma_{iM}}{\sigma_M} = \frac{X_i^2\sigma_i^2 + \sum_{j=1,\\ j\ne 1}^{N} X_j \sigma_{ij}} {\sigma_M}$$ We may get that performing a calculation of the first derivative of the standard deviation of the market portfolio $\sigma_M$, i.e. for
$$\sigma_M^2 = \sum_{i=1}^{N} X_i^2\sigma_i^2 + \sum_{i=1}^N \sum_{j=1, j\ne i}^{N} X_iX_j \sigma_{ij}$$ the risk of a security is the change in the risk of the market portfolio. as the holdings of that security are varied:
$$\frac{d\sigma_M}{dX_i} = \frac { \left[ \sum_{i=1}^{N} X_i^2\sigma_i^2 + \sum_{i=1}^N \sum_{j=1, j\ne i}^{N} X_iX_j\sigma_{ij} \right]^{1/2} } { dX_i }$$
$$= \frac{1}{2} \left[ 2X_i\sigma_i^2 + 2 \sum_{j=1, j\ne 1}^{N} X_j\sigma_{ij} \right] \times \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left[ \sum_{i=1}^{N} X_i^2\sigma_i^2 + \sum_{i=1}^N \sum_{j=1, j\ne i}^{N} X_iX_j\sigma_{ij} \right]^{-1/2}$$
$$= \frac{X_i^2\sigma_i^2 + \sum_{j=1, j\ne 1}^{N} X_j \sigma_{ij} } {\sigma_M} = \frac{\sigma_{iM}}{\sigma_M}$$ Above, $\sigma_{ij}$ denotes a covariance between any $i$ and $j$ security in the market portfolio and $X_i$ is the share of capital invested in $i$’s security.

Interestingly, the only and remaining question is: how do we know what is best choice of the market portfolio? I will try to address a separate post around the research one may conduct covering that topic. As for now, one may assume that S&P500 or S&P900 universe of stocks would serve here as a quite good example. If so, that would also mean that in order to find $\sigma_{iM}/\sigma_M$ for any security we invested our money in, we need to find $\sigma_M$. Is it hard? Well, if we assume the market to be S&P500 then for any $i$ its $X_i=1/500$. Now, all we need to care is to have an idea about the security’s variance $\sigma_i^2$ and the measure of covariance, i.e. how two securities like each other in out market portfolio. A clever reader would ask immediately: But what about a time-window? And that’s where the devil conceals itself.

## Final Frontier of Efficient Frontier

Have you ever considered why in the risk-return space the efficient frontier takes its shape which resembles some of the Bézier curves? And also, why everybody around suggests you that if you approach the efficient frontier from the right-hand side, your selection of a new investment could be the best choice?

Well, as for the latter question the answer seems to be easy. First, you plot all your stocks or the combination of stocks into a number of various portfolios (i.e. you derive the expected return and volatility of a given stock or portfolio, respectively; check out my article on 2-Asset Portfolio Construction for some fundamentals in that field) in the expected risk-return space. Your result can be similar to that one that follows:

where red dots correspond to specific positions of securities which you actually consider for a future investment. The data in the plot are annualized, what simply means we are looking for a potential return and risk (volatility) in a 1-year time horizon only. Secondly, from the plot you can read out that UTX, based on historical data, offers the highest potential return but at quite high risk. If you are not a risk-taker, the plot suggest immediately to choose MMM or JNJ but at lower expected return. So far so good.

The blue line represent so called efficient frontier in the risk-return space. The investor wants to choose the portfolio on the efficient frontier becasue – for each volatility – there is portfolio on the efficient frontier that has a higher rate of return than any other portfolio with the same volatility. In addition, by holding a portfolio on the efficient frontier the investor benefits from the diversification (see also Riskless Diversification). That brings us to an attempt of answering the very first question we posed at the beginning: about the shape, therefore about:

THE MATHEMATICS OF EFFICIENT FRONTIER

As we will show below, the efficient frontier is a solution to the problem of efficient selection between two securities. It is sufficient to conduct the basic calculation based on two assets considered together. For the higher number of potential securities or portfolios of securities (as presented in the plot above), the finding of the efficient frontier line is the subject to more advanced optimization.

Let’s start from the simple case of two securities, A and B, respectively. In our consideration we will omit the scenario where short sales are allowed. The expected portfolio return would be:
$$\bar{R_P} = x_A\bar{R_A} + x_B\bar{R_B}$$
where $x_A+x_B=1$ represents the sum of shares (weights), therefore $x_B=1-x_A$ simply means that the our whole initial capital has been fully invested into both securities, and we firstly decided to put a part of $x_A$ of our money into security A (say, $x_A=0.75$ or 75%). By rewriting the above we get:
$$\bar{R_P} = x_A\bar{R_A} + (1-x_A)\bar{R_B} .$$
The volatility (standard deviation) of that portfolio is therefore equal:
$$\sigma_P = [x_A^2\sigma_A^2 + (1-x_A)^2\sigma_B^2 + 2\sigma_A\sigma_B\rho_{AB}x_A(1-x_A)]^{1/2}$$
where by $\rho_{AB}$ we denoted a linear correlation between the expected rates of return for both securities, also possible to denote as $corr(\bar{R_A},\bar{R_B})$. Please also note the the third term can be jotted down as:
$$2\sigma_A\sigma_B\rho_{AB}x_A(1-x_A) = 2x_A(1-x_A)cov(\bar{R_A},\bar{R_B})$$
where the relation between covariance and correlation holds as follows:
$$cov(\bar{R_A},\bar{R_B}) = \sigma_A\sigma_B corr(\bar{R_A},\bar{R_B}) .$$
The correlation can be a free parameter if we are given from the historical data of securities A and B the following parameters: $\sigma_A, \sigma_B$ and $\bar{R_A}, \bar{R_B}$. This is so important to keep it in mind!

Below, we will consider four cases leading us to the final mathematical curvature of the efficient frontier.

CASE 1: Perfect Positive Correlation ($\rho_{AB}=+1$)

Considering two assets only, A and B, respectively, the perfect positive correlation of $\rho_{AB}=+1$ leads us to the portfolio volatility:
$$\sigma_P = [x_A^2\sigma_A^2 + (1-x_A)^2\sigma_B^2 + 2\sigma_A\sigma_B\times 1 \times x_A(1-x_A)]^{1/2}$$
where we recognize $a^2+b^2+2ab=(a+b)^2$ what allows us to rewrite it as follows:
$$\sigma_P = [x_A\sigma_A + (1-x_A)\sigma_B] \ \ \mbox{for}\ \ \rho_{AB}=+1 .$$
Solving for $x_A$ we get:
$$x_A = \frac{\sigma_P-\sigma_B}{\sigma_A-\sigma_B} \ \ \mbox{i.e.}\ \ x_A \propto \sigma_P .$$
Substituting that into the formula for the expected portfolio return of 2-assets, we arrive at:
$$\bar{R_P} = x_A\bar{R_A} + (1-x_A)\bar{R_B} \\ = \frac{\sigma_P-\sigma_B}{\sigma_A-\sigma_B}\bar{R_A} + \left(1-\frac{\sigma_P-\sigma_B}{\sigma_A-\sigma_B}\right)\bar{R_B} \\ … \\ = \sigma_P\left(\frac{\bar{R_A}-\bar{R_B}}{\sigma_A-\sigma_B}\right) – \sigma_B\left(\frac{\bar{R_A}-\bar{R_B}}{\sigma_A-\sigma_B}\right)+\bar{R_B} .$$
Therefore, the solution for that case can be summarized by the linear relation holding between the expected portfolio return and the portfolio volatility if the portfolio is constructed based on two assets perfectly positively correlated:
$$\bar{R_P}(\sigma_P) \propto \sigma_P .$$

CASE 2: Perfect Negative Correlation ($\rho_{AB}=-1$)

In this case, we can denote the portfolio volatility as:
$$\sigma_P = [x_A^2\sigma_A^2 + (1-x_A)^2\sigma_B^2 + 2\sigma_A\sigma_B\times (-1) \times x_A(1-x_A)]^{1/2}$$
where similarly as in Case 1, we recognize $a^2+b^2-2ab=(a-b)^2$ what lead us to double possible solutions, namely:
$$\sigma_P = x_A\sigma_A – (1-x_A)\sigma_B \ \ \ \ \ \ \ \mbox{or} \\ \sigma_P = -x_A\sigma_A + (1-x_A)\sigma_B \ .$$
Since $\sigma_P$ must always be greater than zero, thus one of the solutions holds anytime.

Please also note that $\sigma_P$(Case 2) is always less than $\sigma_P$(Case 1) for all $x_A\in\langle 0,1\rangle$. That allows us to formulate a very important note: The risk on the portfolio with $\rho_{AB}=-1$ is smaller than for the portfolio with $\rho_{AB}=+1$. Interesting, isn’t it? Hold on and check this out!

It is possible to obtain a portfolio with zero risk by setting $\sigma_P$(Case 2) to zero and solving for $x_A$ (i.e. what fraction of our money we need to invest in asset A to minimize the risk) as follows:
$$x_A\sigma_A – (1-x_A)\sigma_B = 0 \\ x_A^2\sigma_A – \sigma_B – x_A\sigma_B =0 \\ x_A(\sigma_A+\sigma_B) = \sigma_B$$
what provide us with a solution of:
$$x_A = \frac{\sigma_B}{\sigma_A+\sigma_B} .$$
It simply means that for such derived value of $x_A$ the minimal risk of the portfolio occurs, i.e. $\sigma_P=0$.

CASE 3: No Correlation between returns on the assets ($\rho_{AB}=0$)

In this case the cross-term vanishes leaving us with the simplified form of the expected portfolio volatility:
$$\sigma_P = [x_A^2\sigma_A^2 + (1-x_A)^2\sigma_B^2 ]^{1/2} .$$
The graphical presentation is as follows:

The point marked by a dot is peculiar. This portfolio has a minimum risk. It is possible to derive for what sort of value of $x_A$ this is achievable. In order to do it, for a general form of $\sigma_P$,
$$\sigma_P = [x_A^2\sigma_A^2 + (1-x_A)^2\sigma_B^2 + 2x_A(1-x_A)\sigma_A\sigma_B\rho_{AB}]^{1/2} ,$$
we find $x_A$ by calculating the first derivative,
$$\frac{d{\sigma_P}}{d{x_A}} = \left(\frac{1}{2}\right) \frac{2x_A\sigma_A^2-2\sigma_B^2+2x_A\sigma_B^2+2\sigma_A\sigma_B\rho_{AB}-4x_A\sigma_A\sigma_B\rho_{AB}} {[x_A^2\sigma_A^2+(1-x_A)^2\sigma_B^2+2x_A(1-x_A)\sigma_A\sigma_B\rho_{AB}]^{1/2}} \\ = \left(\frac{1}{2}\right) \frac{2x_A\sigma_A^2-2\sigma_B^2+2x_A\sigma_B^2+2\sigma_A\sigma_B\rho_{AB}-4x_A\sigma_A\sigma_B\rho_{AB}}{\sigma_P} ,$$
and by setting it to zero:
$$\frac{d{\sigma_P}}{d{x_A}} = 0$$
we get:
$$x_A = \frac{\sigma_B^2-\sigma_A\sigma_B\rho_{AB}}{\sigma_A^2+\sigma_B^2-2\sigma_A\sigma_B\rho_{AB}}$$
what in case of $\rho_{AB}=0$ reduces to:
$$x_A = \frac{\sigma_B^2}{\sigma_A^2+\sigma_B^2} .$$
This solution is also know as the minimum variance portfolio when two assets are combined in a portfolio.

CASE 4: All other correlations between two assets (e.g. $\rho_{AB}=0.4$)

Given $\sigma_A$ and $\sigma_B$ at $\rho_{AB}$ not equal to zero, $-1$ or $1$, one can find that the formula for the expected variance of 2-asset portfolio will take, in general, the following form:
$$\sigma_P = \sqrt{c_1x_A^2 + c_2}$$
where $c_1$ and $c_2$ denote some coefficients and it can be presented in the graphical form as follows:

There is particular point of interest here and the shape of the curve is simply concaved. Since the investor is always to choose the asset which offers a bigger rate of return (say, $\bar{R}_A\gt\bar{R}_B$) at lower risk ($\sigma_B\lt\sigma_A$), therefore the best combination of portfolios we get every time following these rules are called as efficient frontier, in general represented by the Case 4’s line.

## 2-Asset Portfolio Construction

When you start your journey in the world of finance and money investment, sooner or later, you hit the concept of a portfolio. The portfolio is simply a term used to describe shortly a set of assets you are or your are planning to hold in order to make a profit in a given horizon of time.

If you are a fan of stocks, your desire is to purchase some shares of the company you like or you anticipate it will bring you a profit. Make a note here. I said- the company. Single, not plural. If so, having some initial capital set aside for a particular choice of yours, say $C=\$100,000$, you decided to buy IBM stock for$\$100$. Great! You own a fraction of the IBM company. Regardless the fact whether you are a beginner in investing/trading or you are more advanced in that field, your purchase of IBM shares is now called a long position, i.e. you expect the value of the stock to go up. We will skip the scenario of short-selling (i.e. making the profit when the price of the stock goes down) for the simplicity of this article. All right, you waited a couple of days and, luckily, two wonderful events took place in meanwhile. Firstly, the IBM stock is now not worth $\$100$any longer (the demand for that stock was so high that there were more buyers wanting to purchase a piece of IBM that their buying demand pushed the stock price higher), and now IBM is worth$\$120$ per share. And guess what?! You can’t believe it! You’re smiling just because now you can sell your part (share) of the IBM company to someone else (a buyer who is now willing to pay $\$120$for the same share of IBM, i.e.$\$20$ more!). So, you do it! Secondly, in meanwhile, the company of IBM decided that a part of its net income would be distributed (given to, divided among) all shareholders in the form of dividends that value had been announced by IBM to be $\$5$per one share hold. As a consequence of these two events, your net return occurred to be equal: $$R = \frac{S_t+D_t}{S_{t-1}} – 1 = \frac{\120+5}{\100} – 1 = 0.25 \ \ \mbox{or} \ \ 25\%$$ what in terms of your capital gain simply means you became richer by extra$\$25,000$ according to
$$\mbox{Capital updated} \ = C(1+R) = \100,000(1+0.25) = \125,000 .$$

To be completely honest, this situation would take place under one condition of the market’s operations of buying and selling the IBM stock. You may ask- what’s that? It is commonly denoted and referred to as the situation of perfect financial market(s) – i.e. an assumption of no taxes, no transaction costs, no costs of writing and enforcing contracts, no restrictions on investment in securities, no differences in information across investors, investors take prices as given because they are too small to affect them by their action of buying/selling (i.e. there exists a good liquidity of the stock in the market) – is taken as default. Oh, it’s a rough assumption! It is, indeed. But it has been assumed with a clear purpose of simplifying the basic understanding of what you can gain/lose in the processes of trading. For a curious reader, I refer to my article of Number of Shares for Limit Orders which provides with a broader feeling on how extra fees of trading operations influence your initial capital, $C$, that you wish to invest in a single stock.

Let’s now consider that you want to split your initial capital of $C$ between two choices: IBM and NVDA (NVIDIA Corporation). From some resources you have found out that the latter stock is more volatile than IBM (i.e. its price jumps more rapidly; it’s more risky) therefore you decided to invest:

1. $\$75,000$in IBM (0.75 share of portfolio) 2.$\$25,000$ in NVDA (0.25 share of portfolio) .

Now, also having information that the expected return in 1-year horizon is equal 13% and 26% for IBM and NVDA, respectively, and their corresponding volatilities are 30% and 60%, firstly we may calculate the fundamental portfolio measures based on these two assets (stocks) we wish to hold (invest in and hold).

The Expected Portfolio ($P$) Return
$$E(R_P) = \sum_{i=1}^{N} w_iE(R_i)$$
what in our case study returns the following expectation for two assets ($N=2$),
$$E(R_P) = 0.75\times 0.13+0.25\times 0.26 = 0.1625 \ \ \mbox{or} \ \ 16.25\% .$$

The Volatility of the Return of the Portfolio

Before reaching the exact formula for the volatility of the expected return of our 2-asset portfolio, we need to consider the basic mathematics standing for it. Firstly, have in mind that for any variance for a random variable $R_i$ (here, the expected return of the asset $i$),
$$Var(w_iR_i) = w_i^2 Var(R_i) .$$
Secondly, keep in mind a high-school formula of $(a+b)^2=a^2+b^2+2ab$. Got it? Great! Let’s move forward with the variance of the sum of two variables, say $a$ and $b$:
$$Var(a+b) = E[a+b-E(a+b)]^2 \\ = E[a+b-E(a)-E(b)]^2 \\ = E[a-E(a)+b-E(b)]^2 \\ = E[a-E(a)]^2 + E[b-E(b)]^2 + 2E[a-E(a)]E[b-E(b)] .$$
Where does it take us? Well, to the conclusion of the day, i.e.:
$$= Var(a) + Var(b) + 2Cov(a,b)$$
where $Cov(a,b)$ is a covariance that measures how $a$ and $b$ move together. At this point, we wish to define correlation between two variables $a$ and $b$ as:
$$Cov(a,b) = Vol(a)\times Vol(b)\times Corr(a,b)$$
what for two same assets (variables) perfectly correlated with each other (i.e. $corr(a,a)=1$) would return us with:
$$Cov(a,b)=Vol(a)\times Vol(b)\times 1 = Var(a) .$$
Keeping all foregoing mathematics in mind, we may define the variance of the portfolio return as follows:
$$Var(P) = Var\left(\sum_{i=1}^{N} w_iR_i\right) = \sum w_i^2Var(R_i)+\sum_{i=1}^{N}\sum_{i\ne j,\ j=1}^{N} w_iw_jCov(R_i,R_j) .$$
where again,
$$Cov(R_i,R_j) = Var(R_i)Var(R_j)Corr(R_i,R_j) ,$$
describes the relation between the covariance of expected returns of an asset $i$ and $j$, respectively, and their corresponding variances and mutual (linear) correlation (factor). The latter to be within $[-1,1]$ interval. The formula for $Var(P)$ is extremely important to memorize as it provides the basic information on whether a combination of two assets is less or more risky than putting all our money into one basket?

Therefore, 2-asset portfolio construction constitutes the cornerstone for a basic risk management approach for any investments which require, in general, a division of the investor’s initial capital $C$ into $N$ number of assets (investments) he or she wishes to take the positions in.

Coming back to our case study of IBM and NVDA investment opportunities combined into the portfolio of two assets, we find that:
$$Var(P) = (0.75)^2(0.30)^2 + (0.25)^2(0.60)^2 + 2(0.75)(0.25)(0.30)(0.60)\times \\ Corr(\mbox{IBM},\mbox{NVDA}) = 0.11$$
at assumed, ad hoc, correlation of 0.5 between these two securities. Since,
$$Vol(P)=\sqrt{Var(P)} ,$$
eventually we find that our potential investment in both IBM and NVDA companies – the 2-asset portfolio – given their expected rates of returns at derived volatility levels for a 1-year time-frame is:
$$E(P) \ \ \mbox{at}\ \ Vol(P) = 16.25\% \ \ \mbox{at}\ \ 33.17\% .$$
It turns to be more rewarding ($16.25\%$) than investing solely in IBM (the expected return of $13\%$) at a slightly higher risk ($3\%$ difference).

We will come back to the concept of 2-asset portfolio during the discussion of risk-reward investigation for N-asset investment case.

## Performance-related Risk Measures

Enterprise Risks Management (ERM) can be described as a discipline by which an organization in any industry assesses, controls, exploits, finances, and monitors risks from all sources for the purpose of increasing the organization’s short- and long-term value to its stakeholders. It is a conceptual framework and when adopted by a company it provides with a set of tools to, inter alia, describe and quantify a risk profile. In general, most of the measures common in the practice of ERM can be broken in two categories: (a) solvency-related measures, and (b) performance-related measures. From a quantitative view point the latter refers to the volatility of the organization’s performance on a going-concern basis.

Performance-related risk measures provide us with a good opportunity to quickly review the fundamental definitions of the tools which concentrate on the mid-refion of the probability distribution, i.e. the region near the mean, and relevant for determination of the volatility around expected results:

Volatility (standard deviation), Variance, Mean
$$Vol(x) = \sqrt{Var(x)} = \left[\frac{\sum_{i=1}^{N}(x_i-\bar{x})^2}{N}\right]^{0.5}$$

Shortfall Risk
$$SFR = \frac{1}{N} \sum_{i=1}^{N} 1_{[x_i\lt T]} \times 100\%$$
where $T$ is the target value for the financial variable $x$. Shortfall Risk measure reflects the improvement over $Vol(x)$ measure by taking into account the fact that most of people are risk averse, i.e. they are more concerned with unfavorable deviations rather than favorable ones. Therefore, $SFR$ can be understood as the probability that the financial variable $x_i$ falls below a specified target level of $T$ (if true, $1_{[x_i\lt T]}$ above takes the value of 1).

Value at Risk (VaR)
In VaR-type measures, the equation is reversed: the shortfall risk is specified first, and the corresponding value at risk ($T$) is solved for.

Downside Volatility (or Downside Standard Deviation)
$$DVol(x) = \left[\frac{\sum_{i=1}^{N}(min[0,(x_i-T)]^2}{N}\right]^{0.5}$$
where again $T$ is the target value for the financial variable $x$. Downside volatility focuses not only on the probability of an unfavorable deviation in a financial vairable (as SFR) but also the extent to which it is favorable. It is usually interpreted as the extend to which the financial variable could deviate below a specified target level.

Below Target Risk
$$BTR = \frac{\sum_{i=1}^{N}(min[0,(x_i-T)]}{N}$$
takes its origin from the definition of the downside volatility but the argument is not squared, and there is no square root taken of the sum.

## Riskless Diversification

Diversification can be viewed by many as an attractive tool for risk limitation for a given portfolio currently held by investor. Dealing with numbers one can show that if one holds $N$ securities that are uncorrelated completely, the volatility of such portfolio $Vol(P)$ will decrease with an increasing number of securities $N$ as follows:
$$Vol(P) = \left[\sum_{i=1}^{N} \left(\frac{1}{N}\right)^2 Var(y_i)^2 \right] = \frac{Var(y)}{\sqrt{N}} .$$
What remains unspoken but important to keep in mind is that all securities $i$ where $i=1,…,N$ have the same volatility and the same expected return. Therefore, by holding uncorrelated securities, one can eliminate portfolio volatility if one holds sufficiently many of these securities.

That is a theory. The investor’s reality is usually far from this ideal situation as portfolios usually contain a number of investments mutually connected or dependable upon themselves as the feedback to the market movements or sentiments. Thus, the reminder of diversifiable risk which cannot be eliminated by the diversification process becomes a systematic risk.

## Probability of Financial Ruin

Very often in the modeling of rare events in finance and insurance industry the analysts are interested in both the probability of events and their financial consequences. That revolves around the aspect of a probability of ruin, i.e. when one of two dealing parties becomes insolvent (loses all its assets).

The estimation of chances of going broke may be understood by considering the following gambler’s problem. Two parties X and Y start the game with $x$ and $y$ dollars. If X wins, and its probability of winning is $p$, Y pays $\$1$to X. The game ends when one of the parties amasses all money,$\$(a+b)$.

Let’s denote by $u_{n+1}$ the probability that X wins and holds $\$(n+1)$. If X wins again with$p$it will hold$\$(n+2)$ and probability of winning of $u_{n+2}$ or if Y wins with $q=1-p$, it will hold $\$n$and$u_n$: $$u_{n+1} = pu_{n+2} + qu_n \ \ \mbox{for}\ \ 0 < n+1 < x+y.$$ But because$u_{n+1} = pu_{n+1} + qu_{n+1}$we get: $$(u_{n+2}-u_{n+1}) = \frac{q}{p}(u_{n+1}-u_n).$$ Applying recurrence relation for the problem we may denote: $$(u_2-u_1) = \frac{q}{p}(u_1) \\ (u_3-u_2) = \frac{q}{p}(u_2-u1) = \left(\frac{q}{p}\right)^2 (u_1) \\ ... \\ (u_i-u_{i-1}) = \left(\frac{q}{p}\right)^{i-1}(u_1) .$$ Now, if we sum both sides we will find: $$(u_i-u_1) = \left[\sum_{j=1}^{i-1} \left(\frac{q}{p}\right)^{j}\right](u_1), \ \ \ \mbox{or} \\ u_i = \left[1+\frac{q}{p}+\left(\frac{q}{p}\right)^2+ ... + \left(\frac{q}{p}\right)^{i-1} \right](u_1)$$ Doing maths, one can prove that: $$\left(1-\frac{q}{p}\right) \left[1+\sum_{j=1}^{i-1} \left(\frac{q}{p}\right)^{j} \right] = 1-\left(\frac{q}{p}\right)^i$$ therefore for$p\neq q$we have $$u_i = \left[ \frac{1-\left(\frac{q}{p}\right)^i}{1-\left(\frac{q}{p}\right)} \right] u_1 .$$ Making a note that$u_{x+y}=1$, we evaluate it to: $$u_1 = \left[ \frac{1-\left(\frac{q}{p}\right)}{1-\left(\frac{q}{p}\right)^{x+y}} \right]u_{x+y}$$ what generalized for$i$gives $$u_i = \left[ \frac{1-\left(\frac{q}{p}\right)^i}{1-\left(\frac{q}{p}\right)^{x+y}} \right]$$ therefore we have: $$u_x = \left[ \frac{1-\left(\frac{q}{p}\right)^x}{1-\left(\frac{q}{p}\right)^{x+y}}\right] .$$ For the case of$p=q=\frac{1}{2}$we have: $$u_x = \frac{x}{x+y} .$$ So, what this$u_x$means for us? Let's illustrate it using a real-life example. Learn to count cards before going to Las Vegas If you start playing poker-like game having$y=\$1000$ in your pocket and casino’s total cash that you may win equals $x=\$$50,000,000 but your winning rate solely depends on your luck (favorably, say your$q=1-p=0.49$, i.e. only a bit lower that bank’s odds to win) then your chances of going broke are equal$u_x = 1$. However, if you can count carts and do it consistently over and over again (now your q=0.51) then$u_x \approx 4 \times 10^{-18}\$ what simply means you have a huge potential to be very rich over night using your skills and mathematics.