Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

# Probability of Black Swan Events at NYSE

Featured by a Russian website Financial One (Apr 26, 2016)

The prediction of extreme rare events (EREs) in the financial markets remains one of the toughest problems. Firstly because of a very limited knowledge we have on their distribution and underlying correlations across the markets. Literally, we walk in dark, hoping it won’t happen today, not to the stocks we hold in our portfolios. But is that darkness really so dark?

In this post we will use a textbook knowledge on classical statistics in order to analyze the universe of 2500+ stocks traded at New York Stock Exchange (NYSE) that experienced the most devastating daily loss in their history (1 data point per 1 stock). Based on that, we will try to shed a new light on the most mind-boggling questions: when the next ERE will happen, how long we need to wait for it, and if it hits what its magnitude “of devastation” will be?

1. Extreme Rare Events of NYSE

An ERE meeting certain criteria has its own name in the financial world: a black swan. We will define it in the next Section. Historically speaking, at some point, people were convinced that only white swans existed until someone noticed a population of black swans in Western Australia. The impossible became possible.

The same we observe in the stock markets. Black Monday on Oct 19, 1987 no doubt is the greatest historical event where the markets suffered from 20%+ losses. The event was so rare that any risk analyst could not predict it and/or protect against it. The history repeated itself in 2000 when the Internet bubble caused many IT stocks to lose a lot over the period of few weeks. An event of 9/11 in New York forced the closure of US stock exchange for 6 days, and on Sep 17, as a consequence of fear, about 25 stocks at NYSE lost more than 15% in one day. The uncertainty cannot be avoided but it can be minimized. Is it so? The financial crisis of 2007-2008 triggered a significant number of black swans to fly again and fly high…

Did we learn enough from that? Do we protect more efficiently against black swans after 2009 market recovery? Will you be surprised if I tell you “no”?!

Let’s conduct a data analysis of the most EREs ever spotted at NYSE among all its stocks. To do that, we will use Python language and Yahoo! Finance database. The quality of data are not required to be superbly accurate and we will download only adjusted-close prices with the corresponding dates. A list of companies (inter alia their tickers) traded at NYSE you can extract from a .xls file available for download at http://www.nasdaq.com/screening/company-list.aspx. It contains 3226 tickers, while, as we will see below, Yahoo! Finance recognizes only 2716 of them. Since 2716 is a huge data set meeting a lot of requirements for the statistical significance, I’m more than happy to go along with it.

In the process of stock data download, we look for information on the most extreme daily loss per stock and collect them all in pandas’ DataFrame dfblack. It will contain a Date, the magnitude of ERE (Return), and Delta0. The latter we calculate on spot as a number of business days between some initial point in time (here, 1990-01-01) and the day of ERE occurrence. The processing time may take up to 45 min depending on your Internet speed. Be patient, it will be worth waiting that long. Using HDF5 we save dfblack DataFrame on the disk.

 # Probability of Black Swan Events at NYSE # (c) 2016 by Pawel Lachowicz, QuantAtRisk.com   import numpy as np import pandas as pd import matplotlib.pyplot as plt import pandas_datareader.data as web import datetime as dt from scipy.stats import gumbel_l, t, norm, expon, poisson from math import factorial   # Constants c = (.7, .7, .7) # grey color   col = ['Date', 'Return', 'Delta0'] dfblack = pd.DataFrame(columns=col) # create an empty DataFrame   i = n = 0 for lstline in open("nyseall.lst",'r').readlines(): ticker=lstline[:-1] i += 1 print(i, ticker) try: ok = True data = web.DataReader(str(ticker), data_source='yahoo', start='1900-01-01', end='2016-04-14') except: ok = False if(ok): n += 1 data['Return'] = data['Adj Close' ] / data['Adj Close'].shift(1) - 1 data = data[1:] # skip first row   df = data['Return'].copy() dfs = df.sort_values() start = dt.date(1900, 1, 1) end = dt.date(dfs.index[0].year, dfs.index[0].month, dfs.index[0].day) delta0 = np.busday_count(start,end) tmin = end rmin = dfs[0] dfblack.loc[len(dfblack)]=[tmin, rmin, delta0] # add new row to dfblack   else: print(' not found')   dfblack.set_index(dfblack.Date, inplace=True) # set index by Date dfblack.sort_index(inplace=True) # sort DataFrame by Date del dfblack['Date'] # remove Date column   print("No. of stocks with valid data: %g" % n)   # saving to file store = pd.HDFStore('dfblack.h5') store['dfblack'] = dfblack store.close()   # reading from file store = pd.HDFStore('dfblack.h5') dfblack = pd.read_hdf('dfblack.h5', 'dfblack') store.close()   dfblack0 = dfblack.copy() # a backup copy

where nyseall.lst is a plain text file listing all NYSE tickers:

 DDD MMM WBAI WUBA AHC ...

The first instinct of a savvy risk analyst would be to plot the distribution of the magnitudes of EREs as extracted for the NYSE universe. We achieve that by:

 plt.figure(num=1, figsize=(9, 5)) plt.hist(dfblack.Return, bins=50, color='grey') # a histrogram plt.xlabel('Daily loss', fontsize = 12) plt.xticks(fontsize = 12)   plt.savefig('fig01.png', format='png')

what reveals:

Peaked around -18% with a long left tail of losses reaching nearly -98%. Welcome to the devil’s playground! But, that’s just the beginning. The information on dates corresponding to the occurrence of those EREs allows us to build a 2D picture:

 # Black Monday data sample df_bm = dfblack[dfblack.index == dt.date(1987, 10, 19)] # Reopening of NYSE after 9/11 data sample df_wtc = dfblack[dfblack.index == dt.date(2001, 9, 17)] # Financial Crisis of 2008 data sample df_cre = dfblack[(dfblack.index >= dt.date(2008, 9, 19)) & (dfblack.index <= dt.date(2009, 3, 6))] # Drawdown of 2015/2016 data sample df_16 = dfblack[(dfblack.index >= dt.date(2015, 12, 1)) & (dfblack.index <= dt.date(2016, 2, 11))]   plt.figure(figsize=(13, 6)) plt.plot(dfblack.Return*100, '.k') plt.plot(df_bm.Return*100, '.r') plt.plot(df_wtc.Return*100, '.b') plt.plot(df_cre.Return*100, '.m') plt.plot(df_16.Return*100, '.y') plt.xlabel('Time Interval [%s to %s]' % (dfblack.index[0], dfblack.index[-1]), fontsize=12) plt.ylabel('Magnitude [%]', fontsize=12) plt.title('Time Distribution of the Most Extreme Daily Losses of %g NYSE Stocks (1 point per stock)' % (len(dfblack)), fontsize=13) plt.xticks(fontsize = 12) plt.yticks(fontsize = 12) plt.xlim([dfblack.index[0], dt.date(2016, 4, 14)])   plt.savefig('fig02.png', format='png')

delivering:

Keeping in mind that every (out of 2716) stock is represented only by 1 data point, the figure reveals a dramatic state of the play: significantly lower number of EREs before 1995 and progressing increase after that with a surprisingly huge number in the past 5 years! As discussed in my previous post, Detecting Human Fear in Electronic Trading, the surge in EREs could be explained by a domination of algorithmic trading over manual trades. It may also suggest that the algorithmic risk management for many stocks is either not as good as we want it to be or very sensitive to the global stock buying/selling pressure across the markets (the butterfly effect). On the other side, the number of new stocks being introduced to NYSE may cause them “to fight for life” and those most vulnerable, short-lived, of unestablished position, reputation, financial history, or least liquidity can be the subject to ERE occurrence more likely. Regardless of the cause number one, there are more and more “sudden drops in the altitude” now than ever. I find it alarming.

For clarity, in the above figure, the historical events of Black Monday 1978-10-19 (89 points), reopening of NYSE after 9/11 (25 points), Financial Crisis 2007-2008 (858 points), and a drawdown of 2015/2016 (256 points!) have been marked by red, blue, purple, and dark yellow colours, respectively.

2. Black Swans of NYSE

The initial distribution of Extreme Rare Events of NYSE (Figure 1) provides us with an opportunity to define a black swan event. There is a statistical theory, Extreme Value Theory (EVT), delivering an appealing explanation on behaviour of rare events. We have discussed it in detail in Extreme VaR for Portfolio Managers and Black Swan and Extreme Loss Modeling articles. Thanks to that, we gain a knowledge and tools so needed to describe what we observe. As previously, in this post we will be only considering the Gumbel distribution $G$ of the corresponding probability density function (pdf) $g$ given as:
$$G(z;\ a,b) = e^{-e^{-z}} \ \ \mbox{for}\ \ z=\frac{x-b}{a}, \ x\in\Re$$ and
$$g(z;\ a,b) = b^{-1} e^{-z}e^{-e^{-z}} \ .$$ where $a$ and $b$ are the location parameter and scale parameter, respectively. You can convince yourself that fitting $g(z;\ a,b)$ function truly works in practice:

 # fitting locG, scaleG = gumbel_l.fit(dfblack.Return) # location, scale parameters   dx = 0.0001 x = [dx*i for i in range(-int(1/dx), int(1/dx)+1)] x2 = x.copy()   plt.figure(num=3, figsize=(9, 4)) plt.hist(dfblack.Return, bins=50, normed=True, color=c, label="NYSE Data of 2716 EREs") pdf1 = gumbel_l.pdf(x2, loc=locG, scale=scaleG) y = pdf1.copy() plt.plot(x2, y, 'r', linewidth=2, label="Gumbel PDF fit") plt.xlim([-1, 0]) plt.xlabel('Daily loss [%]', fontsize = 12) plt.xticks(fontsize = 12) plt.legend(loc=2) plt.savefig('fig03.png', format='png')

By deriving:

 print(locG, scaleG, locG-scaleG, locG+scaleG)   # Pr(Loss < locG-scaleG) Pr_bs = gumbel_l.cdf(locG-scaleG, loc=locG, scale=scaleG) print(Pr_bs)

we see that

 -0.151912998711 0.0931852781844 -0.245098276896 -0.058727720527 0.307799372445

i.e. the Gumbel pdf’s location parameter (peak) is at -15.2% with scale of 9.3%. Using an analogy to Normal distribution, we define two brackets, namely:
$$a-b \ \ \mbox{and} \ \ a+b$$ to be -24.5% and -5.9%, respectively. Given that, we define the black swan event as a daily loss of:
$$L < a-b$$ magnitude. Based on NYSE data sample we find that 30.8% of EREs (i.e. greater than 24.5% in loss) meet that criteria. We re-plot their time-magnitude history as follows:

 # skip data after 2015-04-10 (last 252 business days) dfb = dfblack0[dfblack0.index < dt.date(2015, 4, 10)] # re-fit Gumbel (to remain politically correct in-sample ;) locG, scaleG = gumbel_l.fit(dfb.Return) print(locG, locG-scaleG) # -0.16493276305 -0.257085820659   # extract black swans from EREs data set dfb = dfb[dfb.Return < locG-scaleG] dfb0 = dfb.copy()   plt.figure(figsize=(13, 6)) plt.plot(dfb.Return*100, '.r') plt.xlabel('Time Interval [%s to %s]' % (dfb.index[0], dfb.index[-1]), fontsize=14) plt.ylabel('Black Swan Magnitude [%]', fontsize=14) plt.title('Time Distribution of Black Swans of %g NYSE Stocks (L < %.1f%%)' % (len(dfb), (locG-scaleG)*100), fontsize=14) plt.xticks(fontsize = 14) plt.yticks(fontsize = 14) plt.ylim([-100, 0]) plt.xlim([dfb.index[0], dt.date(2016, 4, 14)]) plt.savefig('fig04.png', format='png')

where we deliberately skipped last 252+ data points to allow ourselves for some backtesting later on (out-of-sample). That reduces the number of black swans from 807 down to 627 (in-sample).

3. Modelling the Probability of an Event Based on Observed Occurrences

Let’s be honest. It’s nearly impossible to predict in advance the exact day when next black swan will appear, for which stocks, will it be isolated or trigger some stocks to fall down on the same day? Therefore, what follows from now on, you should consider as a common sense in quantitative modelling of black swans backed by our knowledge of statistics. And yes, despite the opinions of Taleb (2010) in his book of The Black Swan: The Impact of the Highly Improbable we might derive some risk uncertainties and become more aware of the impact if the worst strikes.

The classical guideline we find in Vose’s book Quantitative Risk Analysis: A Guide to Monte Carlo Simulation Modelling (1996). David delivers not only a solid explanations of many statistical distributions but also excellent hints on their practical usage for risk analysts. Since the discovery of this book in 2010 in Singapore, it is my favourite reference manual of all times.

The modelling of the probability of an event based on observed occurrences may follow two distinct processes, analogous to the difference between discrete and continuous distributions: (1) an event that may occur only among a set of specific discrete opportunities, and (2) an event that may occur over a continuum of opportunity.

Considering the black swan events we are not limited by a finite number of instances, therefore we will be taking into account the continuous exposure probability modelling. A continuous exposure process is characterized by the mean interval between events (MIBE) $\beta$. The underlying assumption refers to the uncertainty displaying the properties of a Poisson process, i.e. the probability of an event is independent of however many events have occurred in the past or how recently. And this is our starting point towards a better understanding of NYSE black swans.

3.1. Mean Interval Between Black Swan Events (MIBE)

The MIBE is the average interval between $n$ observed occurrences of a black swan event. Its true value can be estimated from the observed occurrences using central limit theorem:
$$\mbox{MIBE}\beta = \mbox{Normal} \left( \bar{t}, \frac{\sigma}{\sqrt{n-1}} \right)$$ where $\bar{t}$ is the average of the $n-1$ observed intervals $t_i$ between the $n$ observed contiguous black swan events and $\sigma$ is the standard deviation of the $t_i$ intervals. The larger the value of $n$, the greater our confidence on knowing its true value.

First, we estimate $\bar{t}$ from the data. Using NumPy’s histogram function, extra, we confirm that a plain mean is the same as the expected value, $E(\Delta t)$, derived based on the histogram,
$$E(\Delta t) = \sum_{i} p_i x_i \ ,$$ i.e. $\bar{t} = E(\Delta t)$ where there is $p_i$ probability of observing $x_i$. In Python we execute that check as follows:

 deltat = np.diff(dfb.Delta0) # time differences between points   print('tbar = %.2f [days]' % np.mean(deltat)) print('min, max = %.2f, %.2f [days]' % (np.min(deltat), np.max(deltat))) print()   # Confirm the mean using histogram data (mean = the expected value) y, x = np.histogram(deltat, bins=np.arange(np.ceil(np.max(deltat)+2)), density=True) hmean = np.sum(x[:-1]*y) print('E(dt) = %.2f [days]' % hmean)   # plot historgam with a fitted Exp PDF plt.figure(figsize=(9, 5)) plt.bar(x[:-1], y, width=1, color=c) plt.xlim([0, 20]) plt.title('Time Differences between Black Swan Events', fontsize=12) plt.xlabel('Days', fontsize=12) plt.savefig('fig05.png', format='png')

what returns

 tbar = 17.38 [days] min, max = 0.00, 950.00 [days]   E(dt) = 17.38 [days]

What is worth noticing is the fact that the largest recorded separation among historical black swans at NYSE was 950 days, i.e. 2.6 years. By plotting the distribution of time differences, we get:

what could be read out more clearly if we type, e.g.:

 # the first 5 data points from the historgram print(y[0:5]) print(x[0:5])

returning

 [ 0.30990415 0.12300319 0.08466454 0.04472843 0.03354633] [ 0. 1. 2. 3. 4.]

and means that our data suggest that there is a probability of 31.0% of observing the black swan events on the same day, 12.3% with a 1 day gap, 8.5% with a 2 day separation, and so on.

Since MIBE$\beta$ is not given by a single number, the recipe we have for its estimation (see the formula above) allows us solely for modelling of Mean Interval Between Black Swan Events as follows. First, we calculate the corresponding standard deviation:

 sig = np.std(deltat, ddof=1) # standard deviation   print(tbar, sig) print(tbar, sig/np.sqrt(ne-1))
 15.8912 57.5940828519 15.8912 2.30192251209

and next

 ne = len(dfb) # 627, a number of black swans in data sample   betaMIBE_sample = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=10000)   plt.figure(figsize=(9, 5)) plt.hist(betaMIBE_sample, bins=25, color=c) plt.xlabel("MIBE beta [days]", fontsize=12) plt.title("Mean Interval Between Black Swan Events", fontsize=12) plt.savefig('fig06.png', format='png')

3.2. Time Till Next Black Swan Event

Once we have estimated the MIBE $\beta$, it is possible to use its value in order to estimate the time till the next black swan event. It obeys
$$\mbox{TTNE} = \mbox{Expon}(\beta)$$ i.e. the exponential distribution given by the probability density function of:
$$f(x; \beta) = \lambda \exp(-\lambda x) = \frac{1}{\beta} \exp\left(-\frac{x}{\beta}\right) \ .$$ In Python we model TTNE in the following way:

 betaMIBE_one = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=1)   # The time till next event = Expon(beta) ttne = expon.rvs(loc=0, scale=betaMIBE, size=100000)   y, x = np.histogram(ttne, bins=np.arange(int(np.max(ttne))), density=True)   plt.figure(figsize=(9, 5)) plt.bar(x[:-1], y, width=1, color=c, edgecolor=c) plt.xlim([0, 120]) plt.title("Time Till Next Black Swan Event", fontsize=12) plt.xlabel("Days", fontsize=12) plt.savefig('fig07.png', format='png')

where by checking

 print(y[0:5]) print(x[0:5])

we get

 [0.05198052 0.0498905 0.04677047 0.04332043] [1 2 3 4]

id est there is 5.20% of probability that the next black swan event at NYSE will occur on the next day, 4.99% in 2 days, 4.68% in 3 days, etc.

3.3. The Expected Number of Black Swans per Business Day

The distribution of the number of black swan events to occur at NYSE per unit time (here, per business day) can be modelled using Poisson distribution:
$$\mbox{Poisson}(\lambda) \ \ \mbox{where} \ \ \lambda = \frac{1}{\beta} \ .$$ The Poisson distribution models the number of occurrences of an event in a time $T$ when the time between successive events follows a Poisson process. If $\beta$ is the mean time between events, as used by the Exponential distribution, then $\lambda = T/\beta$. We will use it in the “out-of-sample” backtesting in Section 4.

Let’s turn words into action by executing the following Python code:

 # generate a set of Poisson rvs nep = poisson.rvs(1/betaMIBE_one, size=100000)   y, x = np.histogram(nep, bins=np.arange(np.max(nep)+1), density=True)   plt.figure(figsize=(9, 5)) plt.title("The Expected Number of Black Swans per Business Day", fontsize=12) plt.ylabel("N", fontsize=12) plt.ylabel("Probability", fontsize=12) plt.xticks(x) plt.xlim([-0.1, 2.1]) plt.bar(x[:-1], y, width=0.05, color='r', edgecolor='r', align='center') plt.savefig('fig08.png', format='png')

The derived probabilities:

 print(y) print(x[:-1])
 [0.95465 0.04419 0.00116] [0 1 2]

inform us that there is 95.5% of chances that there will be no occurrence of the black swan event per business day, 4.4% that we will record only one, and 0.12% that we may find 2 out of 2500+ NYSE stocks that will suffer from the loss greater than 25.7% on the next business day (unit time).

3.4. Probability of the Occurrence of Several Events in an Interval

The Poisson $(1/\beta)$ distribution calculates the distribution of the number of events that will occur in a single unit interval. The probability of exactly $k$ black swan events in $m$ trials is given by:
$$\mbox{Pr}[X = N] = \frac{\lambda^k}{k!} \exp(-\lambda) = \frac{1}{\beta^k k!} \exp\left(-\frac{1}{\beta}\right)$$
with $\beta$ rescaled to reflect the period of exposure, e.g.:
$$\beta_{yr} = \beta/252$$ where 252 stands for a number of business days in a calendar year. Therefore, the number of black swan events in next business year can be modelled by:
$$N = \mbox{Poisson}(\lambda) = \mbox{Poisson}(1/\beta_{yr}) = \mbox{Poisson}(252/\mbox{Normal}(\bar{t}, \sigma/\sqrt{n-1}))$$ what we can achieve in Python, coding:

 inNdays = 252   N = poisson.rvs(inNdays/norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1)), size=100000) exN = np.round(np.mean(N)) stdN = np.round(np.std(N, ddof=1))   y, x = np.histogram(N, bins=np.arange(int(np.max(N))), density=True)   _ = plt.figure(figsize=(9, 5)) _ = plt.title("The Expected Number of Black Swans in next %g Business Days" % inNdays, fontsize=12) _ = plt.ylabel("N", fontsize=12) _ = plt.ylabel("Probability", fontsize=12) _ = plt.bar(x[:-1], y, width=1, color=c) plt.savefig('fig09.png', format='png')   # probability that it will occur events in next 'inNdays' days print('E(N) = %g' % exN) print('stdN = %g' % stdN)   tmp = [(i,j) for i, j in zip(x,y) if i == exN] print('Pr(X=%g) = %.4f' % (exN, tmp[0][1]))   tmp = [(i,j) for i, j in zip(x,y) if i == 0] print('Pr(X=%g) = %.4f' % (0, tmp[0][1]))

Our Monte Carlo simulation may return, for instance:

where the calculated measures are:

 E(N) = 16 stdN = 4 Pr(X=16) = 0.0992 Pr(X=0) = 0.0000

The interpretation is straightforward. On average, we expect to record $16\pm 4$ black swan events in next 252 business days among 2500+ stocks traded at NYSE as for Apr 10, 2015 COB. The probability of observing exactly 16 black swans is 9.92%. It is natural, based on our data analysis, that the resultant probability of the “extreme luck” of not having any black swan at NYSE, Pr$(X=0)$, in the following trading year is zero. But, miracles happen!

9.92% is derived based on a single run of the simulation. Using the analytical formula for Pr$(X=N)$ given above, we compute the uncertainty of
$$\mbox{Pr}[X = (N=16)] = \frac{1}{(\beta_{yr})^{16} 16!} \exp\left(-\frac{1}{\beta_{yr}}\right) \ ,$$

 Pr = [] for nsim in range(100000): betayr = norm.rvs(loc=tbar, scale=sig/np.sqrt(ne-1), size=1) / inNdays p = 1/(betayr**exN * factorial(exN)) * np.exp(-1/betayr) Pr.append(p)   print('Pr[X = E(N) = %g] = %.2f +- %.2f' % (exN, np.mean(Pr), np.std(Pr, ddof=1)))

that yields

 Pr[X = E(N) = 16] = 0.08 +- 0.02

and stays in and agreement with our result.

4. Prediction and Out-of-Sample Backtesting

All right, time to verify a theoretical statistics in practice. Our goal is to analyze the sample of black swan data skipped so far but available to us. This procedure usually is known as an “out-of-sample” backtesting, i.e. we know “the future” as it has already happened but we pretend it is unknown.

In first step, we look for a suitable time interval of exactly 252 business days “in the future”. It occurs to be:

 print(np.busday_count(dt.date(2015, 4, 10) , dt.date(2016,3, 29))) # 252

Next, we extract out-of-sample (oos) DataFrame and illustrate them in a similar fashion as previously for in-sample data:

 oos = dfblack0[(dfblack0.index >= dt.date(2015, 4, 10)) & (dfblack0.index <= dt.date(2016, 3, 29))]   # extract black swans oos = oos[oos.Return < locG-scaleG]   plt.figure(figsize=(13, 6)) plt.plot(oos.Return*100, '.b') plt.xlabel('Time Interval [%s to %s]' % (dfblack.index[0], oos.index[-1]), fontsize=14) plt.ylabel('Black Swan Magnitude [%]', fontsize=14) plt.title('Time Distribution of Out-of-Sample Black Swans of %g NYSE Stocks (L < %.1f%%)' % (len(oos), (locG-scaleG)*100), fontsize=14) plt.xticks(fontsize = 14) plt.yticks(fontsize = 14) plt.ylim([-100, 0]) plt.xlim([dfblack.index[0], oos.index[-1]]) plt.savefig('fig10.png', format='png')

Based on that, we compare the MIBE as predicted with what the next 252 business days reveal:

 deltat_oos = np.diff(oos.Delta0) # time differences between points   tbar_oos = np.mean(deltat_oos) print('MIBE (predicted) = %.2f +- %.2f [days]' % (tbar, sig/np.sqrt(ne-1))) print('MIBE out-of-sample = %.2f [days]' % tbar_oos) print()   y, x = np.histogram(deltat_oos, bins=np.arange(np.ceil(np.max(deltat)+2)), density=True)   print(y[0:5]) # [ 0.3814433 0.20618557 0.17525773 0.06185567 0.05154639] print(x[0:5]) # [ 0. 1. 2. 3. 4.]   # Predicted probabilities were: # [ 0.30990415 0.12300319 0.08466454 0.04472843 0.03354633]

The outcome surprises, i.e.

 MIBE (predicted) = 17.38 +- 2.30 [days] MIBE out-of-sample = 2.32 [days]

clearly indicating that Apr 2015–Apr 2016 period was nearly 7.5 times more “active” than “all-time” NYSE black swan data would predict to be. Have I already said “it’s alarming”?! Well, at least scary enough for all who trade on the daily basis holding portfolios rich in NYSE assets. The probability of having black swan on the same day increased from 31% to 38%. A new, pure definition of sophisticated gambling.

A verification of the time till next black swan event requires a gentle modification of oos DataFrame. Namely, we need to eliminate all data points with the same Delta0‘s:

 oos2 = oos.Delta0.drop_duplicates(inplace=False) tdiff = np.diff(oos2)   y, x = np.histogram(tdiff, bins=np.arange(int(np.max(tdiff))), density=True)   _ = plt.figure(figsize=(9, 5)) _ = plt.bar(x[:-1], y, width=1, color=c, edgecolor=c) _ = plt.xlim([1, 30]) _ = plt.title("Time Till Next Black Swan Event (Out-of-Sample)", fontsize=12) _ = plt.xlabel("Days", fontsize=12) _ = plt.ylabel("Probability", fontsize=12) plt.savefig('fig11.png', format='png')   print(y[1:5]) print(x[1:5])

Again, from our modeling of Poisson($1/\beta$) we expected to have about 5% chances of waiting for the next black swan event a day, two, or three. Out-of-sample data suggest,

 [0.33898305 0.28813559 0.10169492 0.08474576] [1 2 3 4]

that we are 6.3 times more likely to record black swan event at NYSE on the next business day. Terrifying.

The calculation of the expected number of black swans per business day in out-of-sample data I’m leaving with you as a homework in Python programming. Our earlier prediction of those events in next 252 business days returned 16$\pm$4. As we already have seen it, in a considered time interval after Apr 10, 2015 we recorded 98 swans which is $6.1^{+2.1}_{-1.2}$ times more than forecasted based on a broad sample.

In next post of this series, we will try to develop a methodology for a quantitative derivation of the probability of black swan for $N$-asset portfolio.

Reflection

Investors invest in stocks with anticipation of earning profit. As we have witnessed above, the birds become a serious threat to flying high with a significant increase in this investment air-traffic. Your plane (stock) can be grounded and your airline (portfolio) can suffer from huge losses. Hoping is not the solution. The better ways of protecting against the unavoidable should be implemented for investments while your money are in the game.

But what you should do when black swan strikes? Denzel Washington once said If you pray for rain, you gotta deal with the mud too.

• John Hedge

I wonder if the rise in derivatives and short trading is partly to blame? The mid 70’s downturn was the last before the computer age with 1987 and 2008 spectacularly driven by them.

• http://www.stuartreid.co.za Stuart Reid

Awesome and terrifying post Pawel. I’m really looking forward to your next post which looks at the portfolio construction side of side of the coin.