Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Probability of Black Swan Events at NYSE

Featured by a Russian website Financial One (Apr 26, 2016)

The prediction of extreme rare events (EREs) in the financial markets remains one of the toughest problems. Firstly because of a very limited knowledge we have on their distribution and underlying correlations across the markets. Literally, we walk in dark, hoping it won’t happen today, not to the stocks we hold in our portfolios. But is that darkness really so dark?

In this post we will use a textbook knowledge on classical statistics in order to analyze the universe of 2500+ stocks traded at New York Stock Exchange (NYSE) that experienced the most devastating daily loss in their history (1 data point per 1 stock). Based on that, we will try to shed a new light on the most mind-boggling questions: when the next ERE will happen, how long we need to wait for it, and if it hits what its magnitude “of devastation” will be?

1. Extreme Rare Events of NYSE

An ERE meeting certain criteria has its own name in the financial world: a black swan. We will define it in the next Section. Historically speaking, at some point, people were convinced that only white swans existed until someone noticed a population of black swans in Western Australia. The impossible became possible.

The same we observe in the stock markets. Black Monday on Oct 19, 1987 no doubt is the greatest historical event where the markets suffered from 20%+ losses. The event was so rare that any risk analyst could not predict it and/or protect against it. The history repeated itself in 2000 when the Internet bubble caused many IT stocks to lose a lot over the period of few weeks. An event of 9/11 in New York forced the closure of US stock exchange for 6 days, and on Sep 17, as a consequence of fear, about 25 stocks at NYSE lost more than 15% in one day. The uncertainty cannot be avoided but it can be minimized. Is it so? The financial crisis of 2007-2008 triggered a significant number of black swans to fly again and fly high…

Did we learn enough from that? Do we protect more efficiently against black swans after 2009 market recovery? Will you be surprised if I tell you “no”?!

Let’s conduct a data analysis of the most EREs ever spotted at NYSE among all its stocks. To do that, we will use Python language and Yahoo! Finance database. The quality of data are not required to be superbly accurate and we will download only adjusted-close prices with the corresponding dates. A list of companies (inter alia their tickers) traded at NYSE you can extract from a .xls file available for download at http://www.nasdaq.com/screening/company-list.aspx. It contains 3226 tickers, while, as we will see below, Yahoo! Finance recognizes only 2716 of them. Since 2716 is a huge data set meeting a lot of requirements for the statistical significance, I’m more than happy to go along with it.

In the process of stock data download, we look for information on the most extreme daily loss per stock and collect them all in pandas’ DataFrame dfblack. It will contain a Date, the magnitude of ERE (Return), and Delta0. The latter we calculate on spot as a number of business days between some initial point in time (here, 1990-01-01) and the day of ERE occurrence. The processing time may take up to 45 min depending on your Internet speed. Be patient, it will be worth waiting that long. Using HDF5 we save dfblack DataFrame on the disk.

# Probability of Black Swan Events at NYSE # (c) 2016 by Pawel Lachowicz, QuantAtRisk.com   import numpy as np import pandas as pd import matplotlib.pyplot as plt import pandas_datareader.data as web import datetime as dt from scipy.stats import gumbel_l, t, norm, expon, poisson from math import factorial   # Constants c = (.7, .7, .7) # grey color   col = ['Date', 'Return', 'Delta0'] dfblack = pd.DataFrame(columns=col) # create an empty DataFrame   i = n = 0 for lstline in open("nyseall.lst",'r').readlines(): ticker=lstline[:-1] i += 1 print(i, ticker) try: ok = True data = web.DataReader(str(ticker), data_source='yahoo', start='1900-01-01', end='2016-04-14') except: ok = False if(ok): n += 1 data['Return'] = data['Adj Close' ] / data['Adj Close'].shift(1) - 1 data = data[1:] # skip first row   df = data['Return'].copy() dfs = df.sort_values() start = dt.date(1900, 1, 1) end = dt.date(dfs.index[0].year, dfs.index[0].month, dfs.index[0].day) delta0 = np.busday_count(start,end) tmin = end rmin = dfs[0] dfblack.loc[len(dfblack)]=[tmin, rmin, delta0] # add new row to dfblack   else: print(' not found')   dfblack.set_index(dfblack.Date, inplace=True) # set index by Date dfblack.sort_index(inplace=True) # sort DataFrame by Date del dfblack['Date'] # remove Date column   print("No. of stocks with valid data: %g" % n)   # saving to file store = pd.HDFStore('dfblack.h5') store['dfblack'] = dfblack store.close()   # reading from file store = pd.HDFStore('dfblack.h5') dfblack = pd.read_hdf('dfblack.h5', 'dfblack') store.close()   dfblack0 = dfblack.copy() # a backup copy

where nyseall.lst is a plain text file listing all NYSE tickers:

DDD MMM WBAI WUBA AHC ...

The first instinct of a savvy risk analyst would be to plot the distribution of the magnitudes of EREs as extracted for the NYSE universe. We achieve that by:

plt.figure(num=1, figsize=(9, 5)) plt.hist(dfblack.Return, bins=50, color='grey') # a histrogram plt.xlabel('Daily loss', fontsize = 12) plt.xticks(fontsize = 12)   plt.savefig('fig01.png', format='png')

what reveals:

Peaked around -18% with a long left tail of losses reaching nearly -98%. Welcome to the devil’s playground! But, that’s just the beginning. The information on dates corresponding to the occurrence of those EREs allows us to build a 2D picture:

# Black Monday data sample df_bm = dfblack[dfblack.index == dt.date(1987, 10, 19)] # Reopening of NYSE after 9/11 data sample df_wtc = dfblack[dfblack.index == dt.date(2001, 9, 17)] # Financial Crisis of 2008 data sample df_cre = dfblack[(dfblack.index >= dt.date(2008, 9, 19)) & (dfblack.index <= dt.date(2009, 3, 6))] # Drawdown of 2015/2016 data sample df_16 = dfblack[(dfblack.index >= dt.date(2015, 12, 1)) & (dfblack.index <= dt.date(2016, 2, 11))]   plt.figure(figsize=(13, 6)) plt.plot(dfblack.Return*100, '.k') plt.plot(df_bm.Return*100, '.r') plt.plot(df_wtc.Return*100, '.b') plt.plot(df_cre.Return*100, '.m') plt.plot(df_16.Return*100, '.y') plt.xlabel('Time Interval [%s to %s]' % (dfblack.index[0], dfblack.index[-1]), fontsize=12) plt.ylabel('Magnitude [%]', fontsize=12) plt.title('Time Distribution of the Most Extreme Daily Losses of %g NYSE Stocks (1 point per stock)' % (len(dfblack)), fontsize=13) plt.xticks(fontsize = 12) plt.yticks(fontsize = 12) plt.xlim([dfblack.index[0], dt.date(2016, 4, 14)])   plt.savefig('fig02.png', format='png')

delivering:

Keeping in mind that every (out of 2716) stock is represented only by 1 data point, the figure reveals a dramatic state of the play: significantly lower number of EREs before 1995 and progressing increase after that with a surprisingly huge number in the past 5 years! As discussed in my previous post, Detecting Human Fear in Electronic Trading, the surge in EREs could be explained by a domination of algorithmic trading over manual trades. It may also suggest that the algorithmic risk management for many stocks is either not as good as we want it to be or very sensitive to the global stock buying/selling pressure across the markets (the butterfly effect). On the other side, the number of new stocks being introduced to NYSE may cause them “to fight for life” and those most vulnerable, short-lived, of unestablished position, reputation, financial history, or least liquidity can be the subject to ERE occurrence more likely. Regardless of the cause number one, there are more and more “sudden drops in the altitude” now than ever. I find it alarming.

For clarity, in the above figure, the historical events of Black Monday 1978-10-19 (89 points), reopening of NYSE after 9/11 (25 points), Financial Crisis 2007-2008 (858 points), and a drawdown of 2015/2016 (256 points!) have been marked by red, blue, purple, and dark yellow colours, respectively.

2. Black Swans of NYSE

The initial distribution of Extreme Rare Events of NYSE (Figure 1) provides us with an opportunity to define a black swan event. There is a statistical theory, Extreme Value Theory (EVT), delivering an appealing explanation on behaviour of rare events. We have discussed it in detail in Extreme VaR for Portfolio Managers and Black Swan and Extreme Loss Modeling articles. Thanks to that, we gain a knowledge and tools so needed to describe what we observe. As previously, in this post we will be only considering the Gumbel distribution $G$ of the corresponding probability density function (pdf) $g$ given as:
$$G(z;\ a,b) = e^{-e^{-z}} \ \ \mbox{for}\ \ z=\frac{x-b}{a}, \ x\in\Re$$ and
$$g(z;\ a,b) = b^{-1} e^{-z}e^{-e^{-z}} \ .$$ where $a$ and $b$ are the location parameter and scale parameter, respectively. You can convince yourself that fitting $g(z;\ a,b)$ function truly works in practice:

# fitting locG, scaleG = gumbel_l.fit(dfblack.Return) # location, scale parameters   dx = 0.0001 x = [dx*i for i in range(-int(1/dx), int(1/dx)+1)] x2 = x.copy()   plt.figure(num=3, figsize=(9, 4)) plt.hist(dfblack.Return, bins=50, normed=True, color=c, label="NYSE Data of 2716 EREs") pdf1 = gumbel_l.pdf(x2, loc=locG, scale=scaleG) y = pdf1.copy() plt.plot(x2, y, 'r', linewidth=2, label="Gumbel PDF fit") plt.xlim([-1, 0]) plt.xlabel('Daily loss [%]', fontsize = 12) plt.xticks(fontsize = 12) plt.legend(loc=2) plt.savefig('fig03.png', format='png')

By deriving:

print(locG, scaleG, locG-scaleG, locG+scaleG)   # Pr(Loss < locG-scaleG) Pr_bs = gumbel_l.cdf(locG-scaleG, loc=locG, scale=scaleG) print(Pr_bs)

we see that

-0.151912998711 0.0931852781844 -0.245098276896 -0.058727720527 0.307799372445

i.e. the Gumbel pdf’s location parameter (peak) is at -15.2% with scale of 9.3%. Using an analogy to Normal distribution, we define two brackets, namely:
$$a-b \ \ \mbox{and} \ \ a+b$$ to be -24.5% and -5.9%, respectively. Given that, we define the black swan event as a daily loss of:
$$L < a-b$$ magnitude. Based on NYSE data sample we find that 30.8% of EREs (i.e. greater than 24.5% in loss) meet that criteria. We re-plot their time-magnitude history as follows:

# skip data after 2015-04-10 (last 252 business days) dfb = dfblack0[dfblack0.index < dt.date(2015, 4, 10)] # re-fit Gumbel (to remain politically correct in-sample ;) locG, scaleG = gumbel_l.fit(dfb.Return) print(locG, locG-scaleG) # -0.16493276305 -0.257085820659   # extract black swans from EREs data set dfb = dfb[dfb.Return < locG-scaleG] dfb0 = dfb.copy()   plt.figure(figsize=(13, 6)) plt.plot(dfb.Return*100, '.r') plt.xlabel('Time Interval [%s to %s]' % (dfb.index[0], dfb.index[-1]), fontsize=14) plt.ylabel('Black Swan Magnitude [%]', fontsize=14) plt.title('Time Distribution of Black Swans of %g NYSE Stocks (L < %.1f%%)' % (len(dfb), (locG-scaleG)*100), fontsize=14) plt.xticks(fontsize = 14) plt.yticks(fontsize = 14) plt.ylim([-100, 0]) plt.xlim([dfb.index[0], dt.date(2016, 4, 14)]) plt.savefig('fig04.png', format='png')

where we deliberately skipped last 252+ data points to allow ourselves for some backtesting later on (out-of-sample). That reduces the number of black swans from 807 down to 627 (in-sample).

3. Modelling the Probability of an Event Based on Observed Occurrences

Let’s be honest. It’s nearly impossible to predict in advance the exact day when next black swan will appear, for which stocks, will it be isolated or trigger some stocks to fall down on the same day? Therefore, what follows from now on, you should consider as a common sense in quantitative modelling of black swans backed by our knowledge of statistics. And yes, despite the opinions of Taleb (2010) in his book of The Black Swan: The Impact of the Highly Improbable we might derive some risk uncertainties and become more aware of the impact if the worst strikes.

The classical guideline we find in Vose’s book Quantitative Risk Analysis: A Guide to Monte Carlo Simulation Modelling (1996). David delivers not only a solid explanations of many statistical distributions but also excellent hints on their practical usage for risk analysts. Since the discovery of this book in 2010 in Singapore, it is my favourite reference manual of all times.

The modelling of the probability of an event based on observed occurrences may follow two distinct processes, analogous to the difference between discrete and continuous distributions: (1) an event that may occur only among a set of specific discrete opportunities, and (2) an event that may occur over a continuum of opportunity.

Considering the black swan events we are not limited by a finite number of instances, therefore we will be taking into account the continuous exposure probability modelling. A continuous exposure process is characterized by the mean interval between events (MIBE) $\beta$. The underlying assumption refers to the uncertainty displaying the properties of a Poisson process, i.e. the probability of an event is independent of however many events have occurred in the past or how recently. And this is our starting point towards a better understanding of NYSE black swans.

3.1. Mean Interval Between Black Swan Events (MIBE)

The MIBE is the average interval between $n$ observed occurrences of a black swan event. Its true value can be estimated from the observed occurrences using central limit theorem:
$$\mbox{MIBE}\beta = \mbox{Normal} \left( \bar{t}, \frac{\sigma}{\sqrt{n-1}} \right)$$ where $\bar{t}$ is the average of the $n-1$ observed intervals $t_i$ between the $n$ observed contiguous black swan events and $\sigma$ is the standard deviation of the $t_i$ intervals. The larger the value of $n$, the greater our confidence on knowing its true value.

First, we estimate $\bar{t}$ from the data. Using NumPy’s histogram function, extra, we confirm that a plain mean is the same as the expected value, $E(\Delta t)$, derived based on the histogram,
$$E(\Delta t) = \sum_{i} p_i x_i \ ,$$ i.e. $\bar{t} = E(\Delta t)$ where there is $p_i$ probability of observing $x_i$. In Python we execute that check as follows:

deltat = np.diff(dfb.Delta0) # time differences between points   print('tbar = %.2f [days]' % np.mean(deltat)) print('min, max = %.2f, %.2f [days]' % (np.min(deltat), np.max(deltat))) print()   # Confirm the mean using histogram data (mean = the expected value) y, x = np.histogram(deltat, bins=np.arange(np.ceil(np.max(deltat)+2)), density=True) hmean = np.sum(x[:-1]*y) print('E(dt) = %.2f [days]' % hmean)   # plot historgam with a fitted Exp PDF plt.figure(figsize=(9, 5)) plt.bar(x[:-1], y, width=1, color=c) plt.xlim([0, 20]) plt.title('Time Differences between Black Swan Events', fontsize=12) plt.xlabel('Days', fontsize=12) plt.savefig('fig05.png', format='png')

what returns

tbar = 17.38 [days] min, max = 0.00, 950.00 [days]   E(dt) = 17.38 [days]

What is worth noticing is the fact that the largest recorded separation among historical black swans at NYSE was 950 days, i.e. 2.6 years. By plotting the distribution of time differences, we get:

what could be read out more clearly if we type, e.g.:

# the first 5 data points from the historgram print(y[0:5]) print(x[0:5])

returning

[ 0.30990415 0.12300319 0.08466454 0.04472843 0.03354633] [ 0. 1. 2. 3. 4.]

and means that our data suggest that there is a probability of 31.0% of observing the black swan events on the same day, 12.3% with a 1 day gap, 8.5% with a 2 day separation, and so on.

Since MIBE$\beta$ is not given by a single number, the recipe we have for its estimation (see the formula above) allows us solely for modelling of Mean Interval Between Black Swan Events as follows. First, we calculate the corresponding standard deviation:

sig = np.std(deltat, ddof=1) # standard deviation   print(tbar, sig) print(tbar, sig/np.sqrt(ne-1))
15.8912 57.5940828519 15.8912 2.30192251209

and next

ne = len(dfb) # 627, a number of black swans in data sample   betaMIBE_sample = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=10000)   plt.figure(figsize=(9, 5)) plt.hist(betaMIBE_sample, bins=25, color=c) plt.xlabel("MIBE beta [days]", fontsize=12) plt.title("Mean Interval Between Black Swan Events", fontsize=12) plt.savefig('fig06.png', format='png')

3.2. Time Till Next Black Swan Event

Once we have estimated the MIBE $\beta$, it is possible to use its value in order to estimate the time till the next black swan event. It obeys
$$\mbox{TTNE} = \mbox{Expon}(\beta)$$ i.e. the exponential distribution given by the probability density function of:
$$f(x; \beta) = \lambda \exp(-\lambda x) = \frac{1}{\beta} \exp\left(-\frac{x}{\beta}\right) \ .$$ In Python we model TTNE in the following way:

betaMIBE_one = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=1)   # The time till next event = Expon(beta) ttne = expon.rvs(loc=0, scale=betaMIBE, size=100000)   y, x = np.histogram(ttne, bins=np.arange(int(np.max(ttne))), density=True)   plt.figure(figsize=(9, 5)) plt.bar(x[:-1], y, width=1, color=c, edgecolor=c) plt.xlim([0, 120]) plt.title("Time Till Next Black Swan Event", fontsize=12) plt.xlabel("Days", fontsize=12) plt.savefig('fig07.png', format='png')

where by checking

print(y[0:5]) print(x[0:5])

we get

[0.05198052 0.0498905 0.04677047 0.04332043] [1 2 3 4]

id est there is 5.20% of probability that the next black swan event at NYSE will occur on the next day, 4.99% in 2 days, 4.68% in 3 days, etc.

3.3. The Expected Number of Black Swans per Business Day

The distribution of the number of black swan events to occur at NYSE per unit time (here, per business day) can be modelled using Poisson distribution:
$$\mbox{Poisson}(\lambda) \ \ \mbox{where} \ \ \lambda = \frac{1}{\beta} \ .$$ The Poisson distribution models the number of occurrences of an event in a time $T$ when the time between successive events follows a Poisson process. If $\beta$ is the mean time between events, as used by the Exponential distribution, then $\lambda = T/\beta$. We will use it in the “out-of-sample” backtesting in Section 4.

Let’s turn words into action by executing the following Python code:

# generate a set of Poisson rvs nep = poisson.rvs(1/betaMIBE_one, size=100000)   y, x = np.histogram(nep, bins=np.arange(np.max(nep)+1), density=True)   plt.figure(figsize=(9, 5)) plt.title("The Expected Number of Black Swans per Business Day", fontsize=12) plt.ylabel("N", fontsize=12) plt.ylabel("Probability", fontsize=12) plt.xticks(x) plt.xlim([-0.1, 2.1]) plt.bar(x[:-1], y, width=0.05, color='r', edgecolor='r', align='center') plt.savefig('fig08.png', format='png')

The derived probabilities:

print(y) print(x[:-1])
[0.95465 0.04419 0.00116] [0 1 2]

inform us that there is 95.5% of chances that there will be no occurrence of the black swan event per business day, 4.4% that we will record only one, and 0.12% that we may find 2 out of 2500+ NYSE stocks that will suffer from the loss greater than 25.7% on the next business day (unit time).

3.4. Probability of the Occurrence of Several Events in an Interval

The Poisson $(1/\beta)$ distribution calculates the distribution of the number of events that will occur in a single unit interval. The probability of exactly $k$ black swan events in $m$ trials is given by:
$$\mbox{Pr}[X = N] = \frac{\lambda^k}{k!} \exp(-\lambda) = \frac{1}{\beta^k k!} \exp\left(-\frac{1}{\beta}\right)$$
with $\beta$ rescaled to reflect the period of exposure, e.g.:
$$\beta_{yr} = \beta/252$$ where 252 stands for a number of business days in a calendar year. Therefore, the number of black swan events in next business year can be modelled by:
$$N = \mbox{Poisson}(\lambda) = \mbox{Poisson}(1/\beta_{yr}) = \mbox{Poisson}(252/\mbox{Normal}(\bar{t}, \sigma/\sqrt{n-1}))$$ what we can achieve in Python, coding:

inNdays = 252   N = poisson.rvs(inNdays/norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1)), size=100000) exN = np.round(np.mean(N)) stdN = np.round(np.std(N, ddof=1))   y, x = np.histogram(N, bins=np.arange(int(np.max(N))), density=True)   _ = plt.figure(figsize=(9, 5)) _ = plt.title("The Expected Number of Black Swans in next %g Business Days" % inNdays, fontsize=12) _ = plt.ylabel("N", fontsize=12) _ = plt.ylabel("Probability", fontsize=12) _ = plt.bar(x[:-1], y, width=1, color=c) plt.savefig('fig09.png', format='png')   # probability that it will occur <N> events in next 'inNdays' days print('E(N) = %g' % exN) print('stdN = %g' % stdN)   tmp = [(i,j) for i, j in zip(x,y) if i == exN] print('Pr(X=%g) = %.4f' % (exN, tmp[0][1]))   tmp = [(i,j) for i, j in zip(x,y) if i == 0] print('Pr(X=%g) = %.4f' % (0, tmp[0][1]))

Our Monte Carlo simulation may return, for instance:

where the calculated measures are:

E(N) = 16 stdN = 4 Pr(X=16) = 0.0992 Pr(X=0) = 0.0000

The interpretation is straightforward. On average, we expect to record $16\pm 4$ black swan events in next 252 business days among 2500+ stocks traded at NYSE as for Apr 10, 2015 COB. The probability of observing exactly 16 black swans is 9.92%. It is natural, based on our data analysis, that the resultant probability of the “extreme luck” of not having any black swan at NYSE, Pr$(X=0)$, in the following trading year is zero. But, miracles happen!

9.92% is derived based on a single run of the simulation. Using the analytical formula for Pr$(X=N)$ given above, we compute the uncertainty of
$$\mbox{Pr}[X = (N=16)] = \frac{1}{(\beta_{yr})^{16} 16!} \exp\left(-\frac{1}{\beta_{yr}}\right) \ ,$$

Pr = [] for nsim in range(100000): betayr = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=1) / inNdays p = 1/(betayr**exN * factorial(exN)) * np.exp(-1/betayr) Pr.append(p)   print('Pr[X = E(N) = %g] = %.2f +- %.2f' % (exN, np.mean(Pr), np.std(Pr, ddof=1)))

that yields

Pr[X = E(N) = 16] = 0.08 +- 0.02

and stays in and agreement with our result.

4. Prediction and Out-of-Sample Backtesting

All right, time to verify a theoretical statistics in practice. Our goal is to analyze the sample of black swan data skipped so far but available to us. This procedure usually is known as an “out-of-sample” backtesting, i.e. we know “the future” as it has already happened but we pretend it is unknown.

In first step, we look for a suitable time interval of exactly 252 business days “in the future”. It occurs to be:

print(np.busday_count(dt.date(2015, 4, 10) , dt.date(2016,3, 29))) # 252

Next, we extract out-of-sample (oos) DataFrame and illustrate them in a similar fashion as previously for in-sample data:

oos = dfblack0[(dfblack0.index >= dt.date(2015, 4, 10)) & (dfblack0.index <= dt.date(2016, 3, 29))]   # extract black swans oos = oos[oos.Return < locG-scaleG]   plt.figure(figsize=(13, 6)) plt.plot(oos.Return*100, '.b') plt.xlabel('Time Interval [%s to %s]' % (dfblack.index[0], oos.index[-1]), fontsize=14) plt.ylabel('Black Swan Magnitude [%]', fontsize=14) plt.title('Time Distribution of Out-of-Sample Black Swans of %g NYSE Stocks (L < %.1f%%)' % (len(oos), (locG-scaleG)*100), fontsize=14) plt.xticks(fontsize = 14) plt.yticks(fontsize = 14) plt.ylim([-100, 0]) plt.xlim([dfblack.index[0], oos.index[-1]]) plt.savefig('fig10.png', format='png')

Based on that, we compare the MIBE as predicted with what the next 252 business days reveal:

deltat_oos = np.diff(oos.Delta0) # time differences between points   tbar_oos = np.mean(deltat_oos) print('MIBE (predicted) = %.2f +- %.2f [days]' % (tbar, sig/np.sqrt(ne-1))) print('MIBE out-of-sample = %.2f [days]' % tbar_oos) print()   y, x = np.histogram(deltat_oos, bins=np.arange(np.ceil(np.max(deltat)+2)), density=True)   print(y[0:5]) # [ 0.3814433 0.20618557 0.17525773 0.06185567 0.05154639] print(x[0:5]) # [ 0. 1. 2. 3. 4.]   # Predicted probabilities were: # [ 0.30990415 0.12300319 0.08466454 0.04472843 0.03354633]

The outcome surprises, i.e.

MIBE (predicted) = 17.38 +- 2.30 [days] MIBE out-of-sample = 2.32 [days]

clearly indicating that Apr 2015–Apr 2016 period was nearly 7.5 times more “active” than “all-time” NYSE black swan data would predict to be. Have I already said “it’s alarming”?! Well, at least scary enough for all who trade on the daily basis holding portfolios rich in NYSE assets. The probability of having black swan on the same day increased from 31% to 38%. A new, pure definition of sophisticated gambling.

A verification of the time till next black swan event requires a gentle modification of oos DataFrame. Namely, we need to eliminate all data points with the same Delta0‘s:

oos2 = oos.Delta0.drop_duplicates(inplace=False) tdiff = np.diff(oos2)   y, x = np.histogram(tdiff, bins=np.arange(int(np.max(tdiff))), density=True)   _ = plt.figure(figsize=(9, 5)) _ = plt.bar(x[:-1], y, width=1, color=c, edgecolor=c) _ = plt.xlim([1, 30]) _ = plt.title("Time Till Next Black Swan Event (Out-of-Sample)", fontsize=12) _ = plt.xlabel("Days", fontsize=12) _ = plt.ylabel("Probability", fontsize=12) plt.savefig('fig11.png', format='png')   print(y[1:5]) print(x[1:5])

Again, from our modeling of Poisson($1/\beta$) we expected to have about 5% chances of waiting for the next black swan event a day, two, or three. Out-of-sample data suggest,

[0.33898305 0.28813559 0.10169492 0.08474576] [1 2 3 4]

that we are 6.3 times more likely to record black swan event at NYSE on the next business day. Terrifying.

The calculation of the expected number of black swans per business day in out-of-sample data I’m leaving with you as a homework in Python programming. Our earlier prediction of those events in next 252 business days returned 16$\pm$4. As we already have seen it, in a considered time interval after Apr 10, 2015 we recorded 98 swans which is $6.1^{+2.1}_{-1.2}$ times more than forecasted based on a broad sample.

In next post of this series, we will try to develop a methodology for a quantitative derivation of the probability of black swan for $N$-asset portfolio.

Reflection

Investors invest in stocks with anticipation of earning profit. As we have witnessed above, the birds become a serious threat to flying high with a significant increase in this investment air-traffic. Your plane (stock) can be grounded and your airline (portfolio) can suffer from huge losses. Hoping is not the solution. The better ways of protecting against the unavoidable should be implemented for investments while your money are in the game.

But what you should do when black swan strikes? Denzel Washington once said If you pray for rain, you gotta deal with the mud too.

## Modern Time Analysis of Black Swans

I decided to take the data analysis on Black Swan and Extreme Loss Modeling to the next level and examine the time distribution of extreme losses across the entire S&P 500 universe of traded stocks. Previously, we were interested, first, in finding the maximum loss among all trading days for a given stock in a specified time interval (stock life-time on the trading floor), secondly, in plotting the distribution of those extreme losses (found to be well fitted with the Gumbel distribution). The investigation when all those losses occurred in time can provide us with an extra vantage point, and if we are lucky researchers we can discover some new information on the population of Black Swans.

An excellent approach to data analysis is through time analysis. It is one of many, many data analytic techniques available but it is closest to my heart and I would like to provide you with a sophisticated taste of easy-to-digest mathematics applied to the financial time-series analysis. So what is a data analysis about in our case? Can we code it or be tempted by ready-to-use dedicated data analytic software? Both ways are accessible but let’s do it in more educational way taking computer programming class in Matlab.

By the time analysis we can understand the behaviour of a given quantity in time, e.g. the price of an asset traded in the exchange market. It is pretty straightforward that our eye looks for patterns and tries to classify them somehow. If we observe a repeating structure, or sequence, or shape $-$ that may help us to code its recognition within our trading platforms. The quest for hunting the most fundamental pattern of hidden volatility in the data remains obvious: a sine wave. Fairly painless in understanding, covered by maths teachers in high-schools, ‘up and down’ approach, good moods versus bad moods, bear markets followed by bull markets. A sinusoidal behaviour is everywhere around us. Therefore it is so important to know how to find it!

In quantitative finance and risk modeling, a periodicity or cyclicality constitutes an attractive and simplest model of volatility. To be able to figure out what spectrum of characteristic frequencies our given data set conceals we may need to use a proper tool. In time-series analysis this tool is known as a periodogram. However, before starting using it properly, it is essential to understand the theoretical background.

1. The Method of Period Detection

1.1. General Formulations and Model Orthogonality

In general, we learn from experiments by fitting data, $x$, with a model, $x_{\|}$. The data contain $n$ measurements, and the model, $n_{\|}$ free parameters. The consistency of the data with the model is measured by a function, $\Theta$, called a statistic. A given model, $x_{\|}$, using a given statistic (e.g., $\chi^2$), yields its particular value, $\Theta_1$. Various methods used in the analysis of time series differ both in their choice of the model and the statistic; hence are difficult to compare directly. To enable such a comparison and for determining the significance of results, $\Theta$ is converted into the false alarm probability,$P_1$. This is done considering a hypothetic situation, $H_1$, in which $x$ is pure white noise. Then each pair $(x_{\|},\Theta)$ corresponds to certain cumulative probability distribution of $\Theta$, namely $P(n_{\|},n;\Theta)$, with $P_1$ being the tail probability that under the hypothesis $H_1$ the experiment yields $\Theta>\Theta_1$, i.e., $P_1(\Theta>\Theta_1) = 1-P(n_{\|},n;\Theta_1)$.

Up to here, we have just outlined the classical Neyman-Pearson procedure of statistics. The specific method for analysis of time series used here differs from those commonly encountered in astronomy only in the choices of $x_{\|}$ and $\Theta$. Then, our accounting for variance, correlation and multiple frequencies in calculating $P$ is dictated by the laws of statistics. The probabilities derived by us from the data are the false alarm probabilities. However, we also call them below just probabilities or significance levels.

We note then that Fourier harmonics are not orthogonal in terms of the scalar product with weights at unevenly distributed observations. Certain statistical procedures employing classical probability distributions hold for orthogonal models only and fail in other cases. To avoid that, a popular variant of the power spectrum, Lomb (1976) and Scargle (1982, hereafter LS) Lomb-Scargle periodogram, $P_{\rm LS}(\nu)$, relies on a special choice of phase such that the sine and cosine functions become orthogonal:
$$P_{\rm LS}(\nu) = A_{\rm LS} \left|\hat{x}_{\rm LS}(\nu) \right|^2 .$$ Square of the Fourier amplitude, $\hat{x}(\nu)$, takes form:
$$\begin{eqnarray} \left|\hat{x}_{\rm LS}(\nu)\right|^2 & = \left[ \sum_{k=1}^{n} (x_k-\bar{x}) \cos (2\pi\nu (t_k-\tau)) \right]^2 + \nonumber \\ & \left[ \sum_{k=1}^{n} (x_k-\bar{x}) \sin (2\pi\nu (t_k-\tau)) \right]^2 . \end{eqnarray}$$ The phase $\tau$ is defined as:
$$\tan(4\pi\nu\tau) = \frac{\sum_{k=1}^{n} \sin(4\pi\nu x_k)} {\sum_{k=1}^{N_{\rm obs}} \cos(4\pi\nu x_k)}.$$ where, as usual, we consider a time-series $\{x_i\}\ (i=1,…,n)$; $\bar{x}$ denotes the subtracted mean value, and the discrete Fourier transform takes our signal from time to frequency domain. Orignally, normalization of LS periodogram was proposed as $A_{\rm LS}=1/2\sigma^2$ in order to account for normalization to the level of white noise but different variants are available as well.

In practice, some simplification to the full version of LS periodogram are applied where we are interested in understanding the power density spectrum of $|\hat{x}(\nu)|^2$ distribution. The following Matlab function allow us to obtain a modified LS solution for a time-series $[t,y]$:

% Function calculates the power density spectrum (periodogram) for % a time-series [t,y] based on discrete Fourier transform. The % series is expected to be evenly distributed, but gaps are allowed. % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz % % Input: % t : time vector [1xn] % y : time-series [1xn] % dt : average time-series sampling time % Output: % freq : frequency vector [1xm] % psd : power spectral density [1xm]   function [freq,pds]=qarPDS(t,y,dt); n=length(t); % number of data points X=fft(y,n); Pyy=X.*conj(X); freq=[1:n/2]*(1/n/dt); % in physical units pds=(Pyy(2:(floor(n/2)+1))); pds=pds'; % pds end

We will use it later on to the time analysis of our S&P 500 data of extreme losses across the whole universe of stocks traded. The function computes the periodogram for any time-series so don’t be worried too much about initial data units.

Now, back to the first base. We may extend the approach used within LS method by employing Szego orthogonal trigonometric polynomials as model functions. A series of $n_{\|}=2N+1$ polynomials corresponds to the orthogonal combinations of the $N$ lowest Fourier harmonics (Schwarzenberg-Czerny 1996). Orthogonal series are optimal from the statistical point of view because, by virtue of the Fisher lemma (Fisz 1963; Schwarzenberg-Czerny 1998), they guarantee the minimum variance of the fit residuals for a given model complexity (given by $n_{\|}$). Szego polynomials are also convenient in computations since the least-square solution may be obtained using recurrence and orthogonal projections, resulting in high computational efficiency, with the number of steps $\propto N$ instead of $N^3$ for $N$ harmonics.

1.2. Variance, the AoV statistics, and model complexity

The LS method employs the sine as a model, and the quadratic norm, $$\Theta_{\chi^2}=\|x-x_{\|}\|^2 ,$$ as the statistic. The corresponding probability distribution is $\chi^2$ with 2 degrees of freedom. Prior to use of the $\chi^2$ distribution, $\Theta_{\chi^2}$ has to be divided by the signal variance, $V$. However,$V$ is usually not known and has to be estimated from the data themselves. Then, neither $\Theta_{\chi^2}$ and variance estimates are independent nor their ratio follows the $\chi^2$ distribution, which effect has to be accounted for. A simple way to do it is to apply the Fisher Analysis of Variance (AoV) statistic, $$\Theta\equiv (n-n_{\|}) \|x_{\|}\|^2/ (n_{\|}\|x – x_{\|}\|^2) .$$ Hence we call our method, involving Szego polynomials model and the AoV statistics, the multi-harmonic analysis of variance or mhAoV periodogram (Schwarzenberg-Czerny 1996). The probability distribution is then the Fisher-Snedecor distribution, $F$, rather then $\chi^2$, and $P_1= 1-F(n_{\|},n_{\perp};\Theta)$ where $n_{\perp}=n-n_{\|}$. For everything else fixed, replacing $\chi^2$ with $F$ for $n=100$ yields an increase of $P_1(\chi^2)=0.001$ to $P_1(F)=0.01$. Thus, accounting for the unknown variance yields the mhAoV detection less significant, but more trustworthy. In this work, $n$ usually is larger, for which $P_1(F)/ P_1(\chi^2)$ reduces to several.

Apart from the choice of the statistic, our method for $N=1$ differs from the LS one in the average flux being subtracted in the latter (thus yielding $n_\|=2$) whereas a constant term is fitted in the former (which can be often of significant advantage, see Foster 1995). If the periodic modulation in the data differs significantly from a sinusoid (e.g., due to dips, eclipses, etc.), then our $N>1$ models account for that more complex shape and perform considerably better then the LS one.

1.3. Multiple trials

Probability can be assigned to a period found in data according to one of two statistical hypotheses. Namely, (i) one knows in advance the trial frequency, $\nu_0$ (from other data), and would like to check whether it is also present in a given data set or (ii) one searches a whole range, $\Delta\nu$, of $N_{\rm eff}$ frequencies and finds the frequency, $\nu$, corresponding to the most significant modulation. The two cases correspond to the probabilities $P_1$ and $P_{N_{\rm eff}}$ to win in a lottery after 1 and $N_{\rm eff}$ trials, respectively, i.e., they represent the false alarm probabilities in single and multiple experiments, respectively. They are related by
$$P_{N_{\rm eff}}= 1-(1-P_1)^{N_{\rm eff}} .$$ Note that the hypothesis (ii) and the probability $P_{N_{\rm eff}}$ must be always employed in order to claim any new frequency in the object under study. The hypothesis (i) is rarely used. However, since $P_1\lt P_{N_{\rm eff}}$, it is the more sensitive one. For this reason, we advocate its use in the situations where the modulation frequency is already known, and we aim at checking for its manifestation in the same object but in a new band, new data set, etc. We stress that we do not use the hypothesis (i) to claim any new frequency.

An obstacle hampering use of the (ii) hypothesis is that no analytical method is known to calculate $N_{\rm eff}$. The number $N_{\rm eff}$ corresponds to independent trials, whereas values of periodograms at many frequencies are correlated because of the finite width of the peaks, $\delta\nu$, and because of aliasing. As no analytical method is known to determine $N_{\rm eff}$, Monte Carlo simulations have been used (e.g., Paltani 2004). Here, we use a simple conservative estimate, $N_{\rm eff}= \min(\Delta\nu/\delta\nu, N_{\rm calc},n)$, where $N_{\rm calc}$ is the number of the values at which the periodogram is calculated. The estimate is conservative in the sense that it corresponds to the upper limit on $P_{N_{\rm eff}}$, and thus the minimum significance of detection. This effect applies to all methods of period search (Horne & Baliunas 1986). In general, it may reduce significance of a new frequency detection for large $N_{\rm eff}$ as $P_{N_{\rm eff}}\gg P_1$. In practice, it underscores the role of any prior knowledge, in a way similar to the Bayesian statistics: with any prior knowledge of the given frequency we are able to use the hypothesis (i) to claim the detection with large significance (small $P_1$).

1.4. Correlation length

The $P_1$, and other common probability distributions used to set the detection criteria, are derived under the assumption of the noise being statistically
independent. Often this is not the case, as seen, e.g., in light curves of cataclysmic variables (CVs). The correlated noise, often termed red noise, obeys different probability distribution than the standard $P_1$, and hence may have a profound effect. For example, noise with a Gaussian autocorrelation function (ACF) correlated over a time interval, $\delta t$, yields a power spectrum with the Gaussian shape centered at $\nu=0$ and the width $\delta\nu=1/\delta t$. It may be demonstrated that the net effect of the correlation on $P_1$ in analysis of low frequency processes is to decimate the number of independent observations by a factor $n_{\rm corr}$, the average number of observations in the correlation interval $\delta t$ (Schwarzenberg-Czerny 1991). Effectively, one should use $n_{\perp}/n_{\rm corr}$ and $\Theta/n_{\rm corr}$ instead of $n_{\perp}$ and $\Theta$ in calculating $P_1$. This result holds generally, for both least squares and maximum likelihood analyses of time series.

For independent observations, $m=2$ consecutive residuals have the same sign on average (e.g., Fisz 1963). Thus, counting the average length, $m$, of series of residuals of the same sign provides an estimate of the number of consecutive observations being correlated, $n_{\rm corr}$. Note that $m=n/l$ where $l$ is the number of such series (both positive and negative). For correlated observations, the average length of series with the same sign is $m=2n_{\rm corr}$, which allows us to calculate $n_{\rm corr}$.

Let $\Theta$ denote the Fisher-Snedecor statistics from the mhAoV periodogram (i.e. from Fourier series fit) computed for $n_{\|}=2N+1$ parameters, $n$ observations and $n_{\perp}=n-n_{\|}$ degrees of freedom. To account for $n_{\rm corr}$, we calculate $P_1$ as follows,

P_1=1- F\left(n_{\|},\frac{n_{\perp}}{n_{\rm
corr}};\frac{\Theta}{n_{\rm corr}}\right)=
I_{z}\left(\frac{n_{\perp}}{2n_{\rm
corr}},\frac{n_{\|}}{2}\right)\label{P1},

where $z= n_{\perp}/(n_{\perp}+n_{\|}\Theta)$ and $I_z(a,b)$ is the incomplete (regularized) beta function (Abramowitz & Stegun 1971), see Schwarzenberg-Czerny 1998 and references therein. In the popular Mathematica (Wolfram 1996) that function is called BetaRegularized. In Matlab, the following function does the calculations for us:

1 2 3 4 5 6 7 8 9 10 11 12 % Function computes the mhAoV periodogram peak significance % Usage: [P1,Pneff]=pneff(n,nh,ncorr,neff,theta)   function [P1,Pneff]=pneff(n,nh,ncorr,neff,theta); npar=2.0*nh+1; nper=n-npar; z=nper/(nper+npar*theta); a=nper/(2.0*ncorr); b=npar/2.0; P1=betainc(z,a,b); Pneff=1.0-(1.0-P1)^neff; end

In the following section we will apply both approaches of modified Lomb-Scargle (LS) and multi-harmonic AoV periodograms for financial data and we will discuss the potential consequences coming from time analysis of largest daily losses for stocks traded publicly within S&P500 universe. So buckle up as we are ready for take off!

2. The Annual Migration Routes of Black Swans

A theory can both beautiful and exhausting. So let’s do some work to capture the beauty. Our goal is to re-analyze the data of extreme losses extracted by us previously within Black Swan and Extreme Loss Modeling article. First, we extract the value of the maximum loss for each stock and store them in a matrix in Matlab data as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 % Modern Time Analysis of Black Swans among % Traded Stocks in S&P 500 % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz     clear all; close all; clc; tic;   %% DATA READING AND VISUALIZATION   % read a list of stock names StockNames=dataread('file',['sp500u.lst'],'%s','delimiter', '\n'); K=length(StockNames); % the number of stocks in the universe % path to data files path=['./SP500_1984_2011'];   data=[]; fprintf('data reading and preprocessing..\n'); for si=1:K % --stock name stock=StockNames{si}; fprintf('%4.0f %7s\n',si,stock); % --load data n=[path,'/',stock,'.dat']; % check for NULL and change to NaN (using 'sed' command % in Unix/Linux/MacOS environment cmd=['sed -i ''s/NULL/NaN/g''',' ',n]; [status,result]=system(cmd); % construct FTS object for daily data FTS=ascii2fts(n,1,2); % fill any missing values denoted by NaNs FTS=fillts(FTS); % extract the close price of the stock cp=fts2mat(FTS.CLOSE,0); dd=fts2mat(FTS.CLOSE,1); % extract Matlab matrix containing value of maximum % loss per stock and corresponding day rtmp=cp(2:end)./cp(1:end-1)-1; % daily returns dtmp=dd(2:end,1); % time vector tmp{si}=[dtmp rtmp]; [tmp1,tmp2]=min(tmp{si}(:,2)); % maximum loss data=[data; dtmp(tmp2) tmp1]; % [time of maximum loss, loss value] end   data=sortrows(data,1); % sort data according to time of loss occurrence

where, again, the required data files can be downloaded here as sp500u.zip (23.8 MB) and sp500u.lst, respectively.

The visualization of collected data provides us with a new dimension on time distribution of the maximum losses (Black Swans events) across the S&P 500 universe as traded between 3-Jan-1984 and 8-Mar-2011:

46 47 48 49 plot(data(:,1),data(:,2)*100,'.-','color',[0.7 0.7 0.7]) xlim([min(data(:,1)) max(data(:,1)) ]); xlabel('Time (days)'); ylabel('R_{\rm min} (%)');

All individual stock maximum losses have been depicted with dot markers. As we found within Gumbel distribution analysis, the expected value was -22.6% with a heavy tail extending up to nearly complete losses of $\sim$98%.

Changing the way how we look at our data, we allow to connect the dots and think about data as a new time-series $x_i\ (i=1,…,n=954)$. From this standpoint we can continue our analysis in various direction. Let’s have a look at one case in particular: annual average maximum losses in a function of time. Why? Such approach has been suggested as interesting by McNeil, Frey, and Embrechts in their book Quantitative Risk Management, section 7.1.4., making use of the block maxima method in order to find return levels for stress losses. We turn this idea in practice by rebinning our time-series $\{x_i\}$ with a new time step of 252 (trading) days utilizing the code published within my past post on Rebinning of Financial Time-Series as follows:

51 52 [rx,ry]=rebin(data(:,1),data(:,2),[],252); hold on; plot(rx,ry*100,'or');

and allowing for inappropriate data profanity with a gentle data interpolation between the points:

54 55 56 57 xi=rx(1):0.01:rx(end); rdatai=interp1(rx,ry,xi,'pchip'); rdatai=[xi; rdatai]'; hold on; plot(rdatai(:,1),rdatai(:,2)*100,'r-');

resulting in:

Next, based on non-interpolated data we compute the Fourier power spectrum (a modified LS periodogram) as follows:

59 60 61 62 63 64 65 % Periodogram [freq,pds]=qarPDS(rx,ry,252);   figure(2); plot(freq,pds); xlabel('Frequency [1/d]'); ylabel('Power Spectrum');

which returns:

It is obvious that the periodogram is calculated based on a fixed frequency grid with a frequency step of $\Delta\nu = 1/T = 0.000104$ [1/d]. The peak of highest power corresponds to the sine modulation detected in the time-series which period is equal $1/0.001462$ or 684 days. The maximal allowed frequency is the Nyquist frequency of $1/(2\Delta t)$ or 0.00198 [1/d]. Honestly, the plot is terrible. To improve its quality it is allowed in spectral analysis of time-series to apply over-samling in frequency, i.e to adopt the frequency grid of computations with a step of $\Delta\nu = 1/(kT)$ where $k$ denotes the over-sampling (an integer) factor. Why do we need the over-sampling? One of the reasons is: to find the value of periodicity as accurately as possible.

Let’s see how mhAoV periodogram copes with this task in practice. The source codes for mhAoV can be downloaded directly from Alex’s webpage (available in Fortran 95 and Python), though I still make use of our old version executable directly in Unix/Linux environment: aov. Let’s first store rebinned data (with 252 d step) in an external file of rdata.dat:

67 68 69 70 71 rdata=[rx ry]; fn=['./rdata.dat']; fid=fopen(fn,'wt'); fprintf(fid,'%f %f\n',rdata'); fclose(fid);

and, next, let’s compute aov periodogram:

./aov -f=0.000104,0.002 -nh=1 -fos=5 rdata.dat   mhAov periodogram, by Alex Schwarzenberg-Czerny ver. 27.9.2006 updated by Pawel Lachowicz   datname=rdata.dat trfname=rdata.trf maxname=rdata.max nobs=38 method=ORTAOV nh=1 nbf=20 fos=5.00 frstart=0.0001040 frend=0.0019809 frstep=0.0000107 nfr=176 frequency period theta quality 0.0014747 678.1132747 5.53729 0.743 0.0013152 760.3542866 1.39906 0.146 0.0016301 613.4435376 1.37416 0.138 0.0003351 2984.5922602 1.30742 0.116 0.0001733 5771.4262041 1.22450 0.088 0.0011538 866.7094426 1.12090 0.050

i.e. employing the model which contains a single sinusoid (nh=1) and adopting over-sampling in frequency with $k=5$ (fos). It occurs that the highest value of mhAoV $\Theta=5.537$ statistics corresponds to the period of 678 days.

Fitting the annual data with the model defined as:
$$f(t) = c + A\sin(2\pi t/P – g\pi)$$ we find for $P_1=678$ d the estimation for amplitude $A_1=0.12$ and $g_1=0.79$ and the best fit we over-plot as follows:

This model is not perfect but delivers us a good taste of concealed periodic pattern following a sinusoidal model in about 60% of time between 1984 to 2011. This is an interesting result though the computation of:

>> [P1,Pneff]=pneff(38,1,1.3,7.1,5.53729)   P1 = 0.0138 Pneff = 0.0941

indicates at only 9% of significance of this periodicity. This can be understood as a poor fit of the model to the complicated and variable shape of annual changes in maximal losses for different traded stocks.

A sort of improvement of the model we could achieve by inclusion of variation of the amplitude in function of time, i.e. $A_1(t)$. This can be practically extracted from the wavelet analysis via computation of the continuous wavelet transform. If this is subject of your interest check a draft of this approach in the paper of Czerny et alii (2010) I co-authored.

3. Conclusions

Was Joseph Fourier a birdwatcher? We don’t know. But his approach to the time-series analysis allowed us to check whether any periodic patterns in annual migration (occurrences) of Black Swan events do exist? With a very low probability we found a cyclical trend repeating every 678 days. Will that allow us to forecast the future and next density of massive losses as incurred by individual stocks? Well, now, equipped with power tools of modern approach to time analysis, we can always come back and verify our hypotheses.

References

The theory on the methods of period detection based on publication of Lachowicz et alii (2006). For deeper references to source readings mentioned in the text, check the reference section inside the aforementioned publication.

## Black Swan and Extreme Loss Modeling

When I read the book of Nassim Nicholas Taleb Black Swan my mind was captured by the beauty of extremely rare events and, concurrently, devastated by the message the book sent: the non-computability of the probability of the consequential rare events using scientific methods (as owed to the very nature of small probabilities). I rushed to the local library to find out what had been written on the subject. Surprisingly I discovered the book of Embrechts, Kluppelberg & Mikosh on Modelling Extremal Events for Insurance and Finance which appeared to me very inaccessible, loaded with heavy mathematical theorems and proofs, and a negligible number of practical examples. I left it on the shelf to dust for a longer while until last week when a fresh drive to decompose the problem came back to me again.

In this article I will try to take you for a short but efficient journey through the part of the classical extreme value theory, namely, fluctuations of maxima, and fulfil the story with easy-to-follow procedures on how one may run simulations of the occurrences of extreme rare losses in financial return series. Having this experience, I will discuss briefly the inclusion of resulting modeling to the future stock returns.

1. The Theory of Fluctuations of Maxima

Let’s imagine we have a rich historical data (time-series) of returns for a specific financial asset or portfolio of assets. A good and easy example is the daily rate of returns, $R_i$, for a stock traded e.g. at NASDAQ Stock Market,
$$R_t = \frac{P_t}{P_{t-1}} – 1 \ ,$$ where $P_t$ and $P_{t-1}$ denote a stock price on a day $t$ and $t-1$, respectively. The longer time coverage the more valuable information can be extracted. Given the time-series of daily stock returns, $\{R_i\}\ (i=1,…,N)$, we can create a histogram, i.e. the distribution of returns. By the rare event or, more precisely here, the rare loss we will refer to the returns placed in the far left tail of the distribution. As an assumption we also agree to $R_1,R_2,…$ to be the sequence of iid non-degenerate rvs (random variables) with a common df $F$ (distribution function of $F$). We define the fluctuations of the sample maxima as:
$$M_1 = R_1, \ \ \ M_n = \max(R_1,…,R_n) \ \mbox{for}\ \ n\ge 2 \ .$$ That simply says that for any time-series $\{R_i\}$, there is one maximum corresponding to the rv (random variable) with the most extreme value. Since the main line of this post is the investigation of maximum losses in return time-series, we are eligible to think about negative value (losses) in terms of maxima (therefore conduct the theoretical understanding) thanks to the identity:
$$\min(R_1,…,R_n) = -\max(-R_1,…,-R_n) \ .$$ The distribution function of maximum $M_n$ is given as:
$$P(M_n\le x) = P(R_1\le x, …, R_n\le x) = P(R_1\le x)\cdots P(R_n\le x) = F^n(x)$$ for $x\in\Re$ and $n\in\mbox{N}$.

What the extreme value theory first ‘investigates’ are the limit laws for the maxima $M_n$. The important question here emerges: is there somewhere out there any distribution which satisfies for all $n\ge 2$ the identity in law
$$\max(R_1,…,R_n) = c_nR + d_n$$ for appropriate constants $c_n>0$ and $d_n\in\Re$, or simply speaking, which classes of distributions $F$ are closed for maxima? The theory defines next the max-stable distribution within which a random variable $R$ is called max-stable if it satisfies a aforegoing relation for idd $R_1,…,R_n$. If we assume that $\{R_i\}$ is the sequence of idd max-stable rvs then:
$$R = c_n^{-1}(M_n-d_n)$$ and one can say that every max-stable distribution is a limit distribution for maxima of idd rvs. That brings us to the fundamental Fisher-Trippett theorem saying that if there exist constants $c_n>0$ and $d_n\in\Re$ such that:
$$c_n^{-1}(M_n-d_n) \rightarrow H, \ \ n\rightarrow\infty\ ,$$ then $H$ must be of the type of one of the three so-called standard extreme value distributions, namely: Fréchet, Weibull, and Gumbel. In this post we will be only considering the Gumbel distribution $G$ of the corresponding probability density function (pdf) $g$ given as:
$$G(z;\ a,b) = e^{-e^{-z}} \ \ \mbox{for}\ \ z=\frac{x-b}{a}, \ x\in\Re$$ and
$$g(z;\ a,b) = b^{-1} e^{-z}e^{-e^{-z}} \ .$$ where $a$ and $b$ are the location parameter and scale parameter, respectively. Having defined the extreme value distribution and being now equipped with a better understanding of theory, we are ready for a test drive over daily roads of profits and losses in the trading markets. This is the moment which separates men from boys.

2. Gumbel Extreme Value Distribution for S&P500 Universe

As usual, we start with entrée. Our goal is to find the empirical distribution of maxima (i.e. maximal daily losses) for all stocks belonging to the S&P500 universe between 3-Jan-1984 and 8-Mar-2011. There were $K=954$ stocks traded within this period and their data can be downloaded here as a sp500u.zip file (23.8 MB). The full list of stocks’ names is provided in sp500u.lst file. Therefore, performing the data processing in Matlab, first we need to compute a vector storing daily returns for each stock, and next find the corresponding minimal value $M_n$ where $n$ stands for the length of each return vector:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 % Black Swan and Extreme Loss Modeling % using Gumbel distribution and S&P500 universe % % (c) 2013 QuantAtRisk, by Pawel Lachowicz     clear all; close all; clc; tic;   %% DATA READING AND PREPROCESSING   % read a list of stock names StockNames=dataread('file',['sp500u.lst'],'%s','delimiter', '\n'); K=length(StockNames); % the number of stocks in the universe % path to data files path=['./SP500u'];   fprintf('data reading and preprocessing..\n'); for si=1:K % --stock name stock=StockNames{si}; fprintf('%4.0f %7s\n',si,stock); % --load data n=[path,'/',stock,'.dat']; % check for NULL and change to NaN (using 'sed' command % in Unix/Linux/MacOS environment) cmd=['sed -i ''s/NULL/NaN/g''',' ',n]; [status,result]=system(cmd); % construct FTS object for daily data FTS=ascii2fts(n,1,2); % fill any missing values denoted by NaNs FTS=fillts(FTS); % extract the close price of the stock cp=fts2mat(FTS.CLOSE,0); % calculate a vector with daily stock returns and store it in % the cell array R{si}=cp(2:end)./cp(1:end-1)-1; end   %% ANALYSIS   % find the minimum daily return value for each stock Rmin=[]; for si=1:K Mn=min(R{si},[],1); Rmin=[Rmin; Mn]; end

Having that ready, we fit the data with the Gumbel function which (as we believe) would describe the distribution of maximal losses in the S&P500 universe best:

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 % fit the empirical distribution with Gumbel distribution and % estimate the location, a, and scale, b, parameter [par,parci]=evfit(Rmin); a=par(1); b=par(2);   % plot the distribution x=-1:0.01:1; hist(Rmin,-1:0.01:0); h=findobj(gca,'Type','patch'); set(h,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.6 0.6 0.6]); h=findobj(gca,'Type','box'); set(h,'Color','k');   % add a plot of Gumbel pdf pdf1=evpdf(x,a,b); y=0.01*length(Rmin)*pdf1; line(x,y,'color','r'); box on; xlabel('R_{min}'); ylabel(['f(R_{min}|a,b)']); text(-1,140,['a = ',num2str(paramEstsMinima(1),3)]); text(-1,130,['b = ',num2str(paramEstsMinima(2),3)]); xlim([-1 0]);

The maximum likelihood estimates of the parameters $a$ and $b$ and corresponding 95% confidence intervals we can find as follows:

>> [par,parci]=evfit(Rmin)   par = -0.2265 0.1135   parci = -0.2340 0.1076 -0.2190 0.1197

That brings us to a visual representation of our analysis:

This is a very important result communicating that the expected value of extreme daily loss is equal about -22.6%. However, the left tail of the fitted Gumbel distribution extends far up to nearly -98% although the probability of the occurrence of such a massive daily loss is rather low.

On the other hand, the expected value of -22.6% is surprisingly close to the trading down-movements in the markets on Oct 19, 1987 known as Black Monday when Dow Jones Industrial Average (DJIA) dropped by 508 points to 1738.74, i.e. by 22.61%!

3. Blending Extreme Loss Model with Daily Returns of a Stock

Probably you wonder how can we include the results coming from the Gumbel modeling for the prediction of rare losses in the future daily returns of a particular stock. This can be pretty straightforwardly done combining the best fitted model (pdf) for extreme losses with stock’s pdf. To do it properly we need to employ the concept of the mixture distributions. Michael B. Miller in his book Mathematics and Statistics for Financial Risk Management provides with a clear idea of this procedure. In our case, the mixture density function $f(x)$ could be denoted as:
$$f(x) = w_1 g(x) + (1-w_1) n(x)$$ where $g(x)$ is the Gumbel pdf, $n(x)$ represents fitted stock pdf, and $w_1$ marks the weight (influence) of $g(x)$ into resulting overall pdf.

In order to illustrate this process, let’s select one stock from our S&P500 universe, say Apple Inc. (NASDAQ: AAPL), and fit its daily returns with a normal distribution:

72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 % AAPL daily returns (3-Jan-1984 to 11-Mar-2011) rs=R{18}; figure(2); hist(rs,50); h=findobj(gca,'Type','patch'); set(h,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.6 0.6 0.6]); h=findobj(gca,'Type','box'); set(h,'Color','k'); % fit the normal distribution and plot the fit [muhat,sigmahat]=normfit(rs) x =-1:0.01:1; pdf2=normpdf(x,muhat,sigmahat); y=0.01*length(rs)*pdf2; hold on; line(x,y,'color','r'); xlim([-0.2 0.2]); ylim([0 2100]); xlabel('R');

where the red line represents the fit with a mean of $\mu=0.0012$ and a standard deviation $\sigma=0.0308$.

We can obtain the mixture distribution $f(x)$ executing a few more lines of code:

89 90 91 92 93 94 95 96 % Mixture Distribution Plot figure(3); w1= % enter your favorite value, e.g. 0.001 w2=1-w1; pdfmix=w1*(pdf1*0.01)+w2*(pdf2*0.01); % note: sum(pdfmix)=1 as expected x=-1:0.01:1; plot(x,pdfmix); xlim([-0.6 0.6]);

It is important to note that our modeling is based on $w_1$ parameter. It can be intuitively understood as follows. Let’s say that we choose $w_1=0.01$. That would mean that Gumbel pdf contributes to the overall pdf in 1%. In the following section we will see that if a random variable is drawn from the distribution given by $f(x)$, $w_1=0.01$ simply means (not exactly but with a sufficient approximation) that there is 99% of chances of drawing this variable from $n(x)$ and only 1% from $g(x)$. The dependence of $f(x)$ on $w_1$ illustrates the next figure:

It is well visible that a selection of $w_1>0.01$ would be a significant contributor to the left tail making it fat. This is not the case what is observed in the empirical distribution of daily returns for AAPL (and in general for majority of stocks), therefore we rather expect $w_1$ to be much much smaller than 1%.

4. Drawing Random Variables from Mixture Distribution

A short break between entrée and main course we fill with a sip of red wine. Having discrete form of $f(x)$ we would like to be able to draw a random variable from this distribution. Again, this is easy too. Following a general recipe, for instance given in the Chapter 12.2.2 of Philippe Jorion’s book Value at Risk: The New Benchmark for Managing Financial Risk, we wish to use the concept of inverse transform method. In first step we use the output (a random variable) coming from a pseudo-random generator drawing its rvs based on the uniform distribution $U(x)$. This rv is always between 0 and 1, and in the last step is projected on the cumulative distribution of our interest $F(x)$, what in our case would correspond to the cumulative distribution for $f(x)$ pdf. Finally, we read out the corresponding value on the x-axis: a rv drawn from $f(x)$ pdf. Philippe illustrates that procedure more intuitively:

This methods works smoothly when we know the analytical form of $F(x)$. However, if this not in the menu, we need to use a couple of technical skills. First, we calculate $F(x)$ based on $f(x)$. Next, we set a very fine grid for $x$ domain, and we perform interpolation between given data points of $F(x)$.

98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 % find cumulative pdf, F(x) figure(4); s=0; x=-1; F=[]; for i=1:length(pdfmix); s=s+pdfmix(i); F=[F; x s]; x=x+0.01; end plot(F(:,1),F(:,2),'k') xlim([-1 1]); ylim([-0.1 1.1]); % perform interpolation of cumulative pdf using very fine grid xi=(-1:0.0000001:1); yi=interp1(F(:,1),F(:,2),xi,'linear'); % use linear interpolation method hold on; plot(xi,yi);

The second sort of difficulty is in finding a good match between the rv drawn from the uniform distribution and approximated value for our $F(x)$. That is why a very fine grid is required supplemented with some matching techniques. The following code that I wrote deals with this problem pretty efficiently:

115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 % draw a random variable from f(x) pdf: xi(row) tF2=round((round(100000*yi')/100000)*100000); RV=[]; for k=1:(252*40) notok=false; while(~notok) tU=round((round(100000*rand)/100000)*100000); [r,c,v]=find(tF2==tU); if(~isempty(r)) notok=true; end end if(length(r)>1) rv=round(2+(length(r)-2)*rand); row=r(rv); else row=r(1); end % therefore, xi(row) is a number represting a rv % drawn from f(x) pdf; we store 252*40 of those % new rvs in the following matrix: RV=[RV; xi(row) yi(row)]; end % mark all corresponding rvs on the cumulative pdf hold on; plot(RV(:,1),RV(:,2),'rx');

Finally, as the main course we get and verify the distribution of a large number of new rvs drawn from $f(x)$ pdf. It is crucial to check whether our generating algorithm provides us with a uniform coverage across the entire $F(x)$ plot,

where, in order to get more reliable (statistically) results, we generate 10080 rvs which correspond to the simulated 1-day stock returns for 252 trading days times 40 years.

5. Black Swan Detection

A -22% collapse in the markets on Oct 19, 1978 served as a day when the name of Black Swan event took its birth or at least had been reinforced in the financial community. Are black swans extremely rare? It depends. If you live for example in Perth, Western Australia, you can see a lot of them wandering around. So what defines the extremely rare loss in the sense of financial event? Let’s assume by the definition that by Black Swan event we will understand of a daily loss of 20% or more. If so, using the procedure described in this post, we are tempted to pass from the main course to dessert.

Our modeling concentrates around finding the most proper contribution of $w_1g(x)$ to resulting $f(x)$ pdf. As an outcome of a few runs of Monte Carlo simulations with different values of $w_1$ we find that for $w_1=[0.0010,0.0005,0.0001]$ we detect in the simulations respectively 9, 5, and 2 events (rvs) displaying a one-day loss of 20% or more.

Therefore, the simulated daily returns for AAPL, assuming $w_1=0.0001$, generate in the distribution two Black Swan events, i.e. one event per 5040 trading days, or one per 20 years:

That result agrees quite well with what has been observed so far, i.e. including Black Monday in 1978 and Flash Crash in intra-day trading on May 6, 2010 for some of the stocks.

Acknowledgements

I am grateful to Peter Urbani from New Zealand for directing my attention towards Gumbel distribution for modeling very rare events.

## VaR and Expected Shortfall vs. Black Swan

It is one of the most fundamental approaches in measuring the risk, but truly worth revising its calculation. Value-at-Risk (VaR). A magical number quoted at the end of the day by the banks’ financial risk managers, portfolios’ risk managers, and risk managers simply concerned about the expected risk threshold not to be, hopefully, exceeded on the next day. What is so magical about it? According to definition, given a specific time horizon and taking into account last $N$ days of our trading activity, we can always calculate a number which would provide us with a simple answer to the question: if something really unexpected happened on the next day, what would be the loss margin we could for sure suffer? Well, welcome to the world of VaR!

More formally, given a daily (or, weekly, monthly, etc.) distribution of returns (of, for example, a single stock, a portfolio of assets, etc.), we are interested in finding the value of $VaR_\alpha$ of, say, $\alpha=0.05$ (five percent) which would say to as that there is 5% of chances that the value of $VaR_{0.05}$ would be exceeded in trading on the next day. This value is located in the left tail of the distribution and, by convention, it is given as positive number. Therefore, we define $VaR_\alpha$ as:
$$P[X\le VaR_\alpha] = 1 – \alpha$$ where $X$ is a random variable and can, as in our assumed case, represent a 1-day return on a traded asset or a portfolio of assets.

Beautiful! We have the risk threshold provided as a number. But in practice, how should we calculate $VaR_\alpha$? Here come into spotlight some of the shortcomings of VaR in general. VaR depends on the time-frame of the daily returns we examine: therefore $VaR_\alpha$ as a number will be different if you include last 252 trading days in estimation comparing to last 504 trading days (2 years). Intuitively, the more data (more days, years, etc.) we have the better estimation of $VaR_\alpha$, right? Yes and no. Keep in mind that the dynamic of the trading market changes continuously. In early 80s there was no high-frequency trading (HFT). It dominated trading traffic after the year of 2000 and now HFT plays a key role (an influential factor) in making impact on prices. So, it would be a bit unfair to compare $VaR_\alpha$ estimated in 80s with what is going on right now. $VaR_\alpha$ assumes that the distribution of returns is normal. Not most beautiful assumption but the most straightforward to compute $VaR_\alpha$. Should we use $VaR_\alpha$? Yes, but with caution.

Okay then, having $VaR_\alpha$ calculated we know how far the loss could reach at the $1-\alpha$ confidence level. The next super important question in risk management every risk manager should ask or at least be interested in reporting is: if $VaR_\alpha$ event occurs on the next day, what is the expected loss we might expect to suffer (say, in dollars)? $VaR_\alpha$ is the threshold. Now we are interested in the expected value of loss given the exceedance. It is the definition of expected shortfall and is based on the concept of conditional probability as follows:
$$E[X\ |\ X>VaR_\alpha] = ES \ .$$ In general, if our daily distribution of returns can be described by a function $f(x)$ which would represent a power density function (pdf), then:
$$ES = \frac{1}{\alpha} \int_{-\infty}^{VaR_\alpha} xf(x)dx \ .$$

Given any data set of returns, $R_t\ (t=1,…,N)$, calculated from our price history,
$$R_t = \frac{P_{t+1}}{P_t} -1$$ both numbers, $VaR_\alpha$ and $ES$ can be, in fact, calculated in two ways. The first method is based on the empirical distribution, i.e. using data as given:
$$VaR_\alpha = h_i^{VaR} \ \ \mbox{for}\ \ \sum_{i=1}^{M-1} H_i(h_{i+1}-h_i) \le \alpha$$ where $H$ represents the normalized histogram of $R_t$ (i.e., its integral is equal 1) and $M$ is the number of histograms bins. Similarly for $ES$, we get:
$$ES = \sum_{i=1}^{h_i^{VaR}} h_iH_i(h_{i+1}-h_i) \ .$$ The second method would be based on integrations given the best fit to the histogram of $R_t$ using $f(x)$ being a normal distribution. As we will see in the practical example below, both approaches returns different values, and an extra caution should be undertaken while reporting the final risk measures.

Case Study: IBM Daily Returns in 1987

The year of 1987 came into history as the most splendid time in the history of stock trading. Why? Simply because of so-called Black Monday on Oct 19, where markets denoted over 20% losses in a single trading day. Let’s analyse the case of daily returns and their distribution for IBM stock, as analysed by a risk manager after the close of the market on Thu, Dec 31, 1978. The risk manager is interested in the estimation of 1-day 95% VaR threshold, i.e. $VaR_0.05$, and the expected shortfall if on the next trading day (Jan 4, 1988) the exceedance event would occur. Therefore, let’s assume that our portfolio is composed solely of the IBM stock and we have the allocation of the capital of $C$ dollars in it as for Dec 31, 1987.

Using a record of historical trading, IBM.dat, of the form:

IBM DATES OPEN HIGH LOW CLOSE VOLUME 03-Jan-1984 0.000000 0.000000 0.000000 30.437500 0.000000 04-Jan-1984 0.000000 0.000000 0.000000 30.968750 0.000000 05-Jan-1984 0.000000 0.000000 0.000000 31.062500 0.000000 06-Jan-1984 0.000000 0.000000 0.000000 30.875000 0.000000 ... 04-Mar-2011 0.000000 0.000000 0.000000 161.830000 0.000000 07-Mar-2011 0.000000 0.000000 0.000000 159.930000 0.000000

containing the close price of IBM stock, in Matlab, we extract the data for the year of 1987, then we construct a vector of daily returns $R_t$ on the stock,

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 % Reading in the data fn=['IBM.dat']; ftsD=ascii2fts(fn,1,2); range=[datenum('02-Jan-1987'),datenum('31-Dec-1987')]; drr=[datestr(range(1)),'::',datestr(range(2))]; fts=ftsD(drr); data=fts2mat(fts,1);   % R_t vector rdata=data(2:end,5)./data(1:end-1,5)-1;   % Plotting figure(1); subplot(2,1,1); plot(data(:,1)-datenum('2-Jan-1987'),data(:,5)); xlim([0 365]); ylabel('IBM Close Price ($)'); subplot(2,1,2); plot(data(2:end,1)-datenum('2-Jan-1987'),rdata); xlim([0 365]); ylabel('R_t') xlabel('t (days)') and present the data graphically: Having our data, let’s find$VaR_{0.05}$and corresponding$ES$, 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 % Construct the normalized histogram [H,h]=hist(rdata,100); hist(rdata,100); sum=0; for i=1:length(H)-1 sum=sum+(H(i)*(h(i+1)-h(i))); end figure(2) H=H/sum; bar(h,H,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.7 0.7 0.7]) % Calculate VaR 95 based on empirical distribution sum=0; i=1; while (i<length(H)) sum=sum+(H(i)*(h(i+1)-h(i))); if(sum>=0.05) break; end i=i+1; end VaR=h(i) hold on; plot(h(i),0,'ro'); % mark VaR in the plot % Fit the normal distribution to R_t data [muhat,sigmahat] = normfit([rdata]'); Y = normpdf(h,muhat(1),sigmahat(1)); hold on; plot(h,Y,'b-'); % and plot the fit using blue line % Find for the fitted N(muhat,sigmahat) VaR 95% sum=0; i=1; while (i<length(h)) sum=sum+(Y(i)*(h(i+1)-h(i))); if(sum>=0.05) break; end i=i+1; end VaR_fit=h(i) hold on; plot(h(i),0,'ro','markerfacecolor','r'); % mark VaR_fit ivar=i; % Calculate the Expected Shortfall % based on fitted normal distribution sum=0; i=1; while (i<=ivar) sum=sum+(h(i)*Y(i)*(h(i+1)-h(i))); i=i+1; end ES_fit=sum/0.05 hold on; plot(ES_fit,0,'ko','markerfacecolor','k'); % Add numbers to the plot text(-0.23,25,['1-Day VaR and Expected Shortfall']); text(-0.23,23.5,['as for 31/12/1987']); text(-0.23,20,['VaR = ',num2str(VaR*-1,3),' (',num2str(VaR*-100,2),'%)']); text(-0.23,18,['ES = ',num2str(ES_fit*-1,3),' (',num2str(ES_fit*-100,2),'%)']); text(-0.23,13,['\mu = ',num2str(muhat,2)]); text(-0.23,11,['\sigma = ',num2str(sigmahat,2)]); xlabel('R_t'); marking all numbers in resulting common plot: In the plot, filled red and black markers correspond to$VaR_{0.05}$and$ES$, respectively, and are calculated using the fitted normal distribution. Both values have been displayed in the plot as well. In addition to that, the plot contains a red open circle marker denoting the value of$VaR_{0.05}$obtained from the empirical histogram discrete integration. Please note how the choice of the method influences the value of VaR. The derived numbers tell the risk manager that in 95% of cases one should feel confident that the value of$VaR_{0.05}$will not be exceed in trading of IBM on Jan 4, 1988 (next trading day).$C\times VaR_{0.05} = 0.028C$would provide us with the loss amount (in dollars) if such undesired event had occurred. And if it was the case indeed, the expected loss (in dollars) we potentially would suffer, would be$C\times ES = 0.058C$. Now it easy to understand why the Black Swan event of Oct 19, 1987 falls into the category of super least likely moves in the markets. It places itself in the far left tail of both empirical and fitted distributions denoting an impressive loss of 23.52% for IBM on that day. If we assume that the daily price changes are normally distributed then probability of a drop of at least 22 standard deviations is $$P(Z\le -22) = \Phi(-22) = 1.4 \times 10^{-107} \ .$$ This is astronomically rare event! They say that a lightening does not strike two times, but if a 1-day drop in the market of the same magnitude had occurred again on Jan 4, 1988, our long position in IBM would cause us to suffer not the loss of$0.058C$dollars but approximately$0.23C\$. Ahhhuc!