Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Modern Time Analysis of Black Swans


I decided to take the data analysis on Black Swan and Extreme Loss Modeling to the next level and examine the time distribution of extreme losses across the entire S&P 500 universe of traded stocks. Previously, we were interested, first, in finding the maximum loss among all trading days for a given stock in a specified time interval (stock life-time on the trading floor), secondly, in plotting the distribution of those extreme losses (found to be well fitted with the Gumbel distribution). The investigation when all those losses occurred in time can provide us with an extra vantage point, and if we are lucky researchers we can discover some new information on the population of Black Swans.

An excellent approach to data analysis is through time analysis. It is one of many, many data analytic techniques available but it is closest to my heart and I would like to provide you with a sophisticated taste of easy-to-digest mathematics applied to the financial time-series analysis. So what is a data analysis about in our case? Can we code it or be tempted by ready-to-use dedicated data analytic software? Both ways are accessible but let’s do it in more educational way taking computer programming class in Matlab.

By the time analysis we can understand the behaviour of a given quantity in time, e.g. the price of an asset traded in the exchange market. It is pretty straightforward that our eye looks for patterns and tries to classify them somehow. If we observe a repeating structure, or sequence, or shape $-$ that may help us to code its recognition within our trading platforms. The quest for hunting the most fundamental pattern of hidden volatility in the data remains obvious: a sine wave. Fairly painless in understanding, covered by maths teachers in high-schools, ‘up and down’ approach, good moods versus bad moods, bear markets followed by bull markets. A sinusoidal behaviour is everywhere around us. Therefore it is so important to know how to find it!

In quantitative finance and risk modeling, a periodicity or cyclicality constitutes an attractive and simplest model of volatility. To be able to figure out what spectrum of characteristic frequencies our given data set conceals we may need to use a proper tool. In time-series analysis this tool is known as a periodogram. However, before starting using it properly, it is essential to understand the theoretical background.

1. The Method of Period Detection

1.1. General Formulations and Model Orthogonality

In general, we learn from experiments by fitting data, $x$, with a model, $x_{\|}$. The data contain $n$ measurements, and the model, $n_{\|}$ free parameters. The consistency of the data with the model is measured by a function, $\Theta$, called a statistic. A given model, $x_{\|}$, using a given statistic (e.g., $\chi^2$), yields its particular value, $\Theta_1$. Various methods used in the analysis of time series differ both in their choice of the model and the statistic; hence are difficult to compare directly. To enable such a comparison and for determining the significance of results, $\Theta$ is converted into the false alarm probability,$P_1$. This is done considering a hypothetic situation, $H_1$, in which $x$ is pure white noise. Then each pair $(x_{\|},\Theta)$ corresponds to certain cumulative probability distribution of $\Theta$, namely $P(n_{\|},n;\Theta)$, with $P_1$ being the tail probability that under the hypothesis $H_1$ the experiment yields $\Theta>\Theta_1$, i.e., $P_1(\Theta>\Theta_1) = 1-P(n_{\|},n;\Theta_1)$.

Up to here, we have just outlined the classical Neyman-Pearson procedure of statistics. The specific method for analysis of time series used here differs from those commonly encountered in astronomy only in the choices of $x_{\|}$ and $\Theta$. Then, our accounting for variance, correlation and multiple frequencies in calculating $P$ is dictated by the laws of statistics. The probabilities derived by us from the data are the false alarm probabilities. However, we also call them below just probabilities or significance levels.

We note then that Fourier harmonics are not orthogonal in terms of the scalar product with weights at unevenly distributed observations. Certain statistical procedures employing classical probability distributions hold for orthogonal models only and fail in other cases. To avoid that, a popular variant of the power spectrum, Lomb (1976) and Scargle (1982, hereafter LS) Lomb-Scargle periodogram, $P_{\rm LS}(\nu)$, relies on a special choice of phase such that the sine and cosine functions become orthogonal:
$$
\begin{equation}
P_{\rm LS}(\nu) = A_{\rm LS} \left|\hat{x}_{\rm LS}(\nu) \right|^2 .
\end{equation}
$$ Square of the Fourier amplitude, $\hat{x}(\nu)$, takes form:
$$
\begin{eqnarray}
\left|\hat{x}_{\rm LS}(\nu)\right|^2 & = \left[ \sum_{k=1}^{n} (x_k-\bar{x})
\cos (2\pi\nu (t_k-\tau))
\right]^2 +
\nonumber \\
& \left[ \sum_{k=1}^{n} (x_k-\bar{x}) \sin (2\pi\nu
(t_k-\tau))
\right]^2 .
\end{eqnarray}
$$ The phase $\tau$ is defined as:
$$
\begin{equation}
\tan(4\pi\nu\tau) = \frac{\sum_{k=1}^{n} \sin(4\pi\nu x_k)}
{\sum_{k=1}^{N_{\rm obs}} \cos(4\pi\nu x_k)}.
\end{equation}
$$ where, as usual, we consider a time-series $\{x_i\}\ (i=1,…,n)$; $\bar{x}$ denotes the subtracted mean value, and the discrete Fourier transform takes our signal from time to frequency domain. Orignally, normalization of LS periodogram was proposed as $A_{\rm LS}=1/2\sigma^2$ in order to account for normalization to the level of white noise but different variants are available as well.

In practice, some simplification to the full version of LS periodogram are applied where we are interested in understanding the power density spectrum of $|\hat{x}(\nu)|^2$ distribution. The following Matlab function allow us to obtain a modified LS solution for a time-series $[t,y]$:

% Function calculates the power density spectrum (periodogram) for
%  a time-series [t,y] based on discrete Fourier transform. The
%  series is expected to be evenly distributed, but gaps are allowed.
%
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
%
% Input:
%           t       : time vector   [1xn]
%           y       : time-series   [1xn]
%           dt      : average time-series sampling time
% Output:
%           freq    : frequency vector        [1xm]
%           psd     : power spectral density  [1xm]
 
function [freq,pds]=qarPDS(t,y,dt);
    n=length(t);                % number of data points
    X=fft(y,n);
    Pyy=X.*conj(X);
    freq=[1:n/2]*(1/n/dt);      % in physical units
    pds=(Pyy(2:(floor(n/2)+1)));
    pds=pds';                   % pds
end

We will use it later on to the time analysis of our S&P 500 data of extreme losses across the whole universe of stocks traded. The function computes the periodogram for any time-series so don’t be worried too much about initial data units.

Now, back to the first base. We may extend the approach used within LS method by employing Szego orthogonal trigonometric polynomials as model functions. A series of $n_{\|}=2N+1$ polynomials corresponds to the orthogonal combinations of the $N$ lowest Fourier harmonics (Schwarzenberg-Czerny 1996). Orthogonal series are optimal from the statistical point of view because, by virtue of the Fisher lemma (Fisz 1963; Schwarzenberg-Czerny 1998), they guarantee the minimum variance of the fit residuals for a given model complexity (given by $n_{\|}$). Szego polynomials are also convenient in computations since the least-square solution may be obtained using recurrence and orthogonal projections, resulting in high computational efficiency, with the number of steps $\propto N$ instead of $N^3$ for $N$ harmonics.

1.2. Variance, the AoV statistics, and model complexity


The LS method employs the sine as a model, and the quadratic norm, $$\Theta_{\chi^2}=\|x-x_{\|}\|^2 , $$ as the statistic. The corresponding probability distribution is $\chi^2$ with 2 degrees of freedom. Prior to use of the $\chi^2$ distribution, $\Theta_{\chi^2}$ has to be divided by the signal variance, $V$. However,$V$ is usually not known and has to be estimated from the data themselves. Then, neither $\Theta_{\chi^2}$ and variance estimates are independent nor their ratio follows the $\chi^2$ distribution, which effect has to be accounted for. A simple way to do it is to apply the Fisher Analysis of Variance (AoV) statistic, $$\Theta\equiv (n-n_{\|}) \|x_{\|}\|^2/ (n_{\|}\|x – x_{\|}\|^2) .$$ Hence we call our method, involving Szego polynomials model and the AoV statistics, the multi-harmonic analysis of variance or mhAoV periodogram (Schwarzenberg-Czerny 1996). The probability distribution is then the Fisher-Snedecor distribution, $F$, rather then $\chi^2$, and $P_1= 1-F(n_{\|},n_{\perp};\Theta)$ where $n_{\perp}=n-n_{\|}$. For everything else fixed, replacing $\chi^2$ with $F$ for $n=100$ yields an increase of $P_1(\chi^2)=0.001$ to $P_1(F)=0.01$. Thus, accounting for the unknown variance yields the mhAoV detection less significant, but more trustworthy. In this work, $n$ usually is larger, for which $P_1(F)/ P_1(\chi^2)$ reduces to several.

Apart from the choice of the statistic, our method for $N=1$ differs from the LS one in the average flux being subtracted in the latter (thus yielding $n_\|=2$) whereas a constant term is fitted in the former (which can be often of significant advantage, see Foster 1995). If the periodic modulation in the data differs significantly from a sinusoid (e.g., due to dips, eclipses, etc.), then our $N>1$ models account for that more complex shape and perform considerably better then the LS one.

1.3. Multiple trials

Probability can be assigned to a period found in data according to one of two statistical hypotheses. Namely, (i) one knows in advance the trial frequency, $\nu_0$ (from other data), and would like to check whether it is also present in a given data set or (ii) one searches a whole range, $\Delta\nu$, of $N_{\rm eff}$ frequencies and finds the frequency, $\nu$, corresponding to the most significant modulation. The two cases correspond to the probabilities $P_1$ and $P_{N_{\rm eff}}$ to win in a lottery after 1 and $N_{\rm eff}$ trials, respectively, i.e., they represent the false alarm probabilities in single and multiple experiments, respectively. They are related by
$$ P_{N_{\rm eff}}= 1-(1-P_1)^{N_{\rm eff}} .$$ Note that the hypothesis (ii) and the probability $P_{N_{\rm eff}}$ must be always employed in order to claim any new frequency in the object under study. The hypothesis (i) is rarely used. However, since $P_1\lt P_{N_{\rm eff}}$, it is the more sensitive one. For this reason, we advocate its use in the situations where the modulation frequency is already known, and we aim at checking for its manifestation in the same object but in a new band, new data set, etc. We stress that we do not use the hypothesis (i) to claim any new frequency.

An obstacle hampering use of the (ii) hypothesis is that no analytical method is known to calculate $N_{\rm eff}$. The number $N_{\rm eff}$ corresponds to independent trials, whereas values of periodograms at many frequencies are correlated because of the finite width of the peaks, $\delta\nu$, and because of aliasing. As no analytical method is known to determine $N_{\rm eff}$, Monte Carlo simulations have been used (e.g., Paltani 2004). Here, we use a simple conservative estimate, $N_{\rm eff}= \min(\Delta\nu/\delta\nu, N_{\rm calc},n)$, where $N_{\rm calc}$ is the number of the values at which the periodogram is calculated. The estimate is conservative in the sense that it corresponds to the upper limit on $P_{N_{\rm eff}}$, and thus the minimum significance of detection. This effect applies to all methods of period search (Horne & Baliunas 1986). In general, it may reduce significance of a new frequency detection for large $N_{\rm eff}$ as $P_{N_{\rm eff}}\gg P_1$. In practice, it underscores the role of any prior knowledge, in a way similar to the Bayesian statistics: with any prior knowledge of the given frequency we are able to use the hypothesis (i) to claim the detection with large significance (small $P_1$).

1.4. Correlation length

The $P_1$, and other common probability distributions used to set the detection criteria, are derived under the assumption of the noise being statistically
independent. Often this is not the case, as seen, e.g., in light curves of cataclysmic variables (CVs). The correlated noise, often termed red noise, obeys different probability distribution than the standard $P_1$, and hence may have a profound effect. For example, noise with a Gaussian autocorrelation function (ACF) correlated over a time interval, $\delta t$, yields a power spectrum with the Gaussian shape centered at $\nu=0$ and the width $\delta\nu=1/\delta t$. It may be demonstrated that the net effect of the correlation on $P_1$ in analysis of low frequency processes is to decimate the number of independent observations by a factor $n_{\rm corr}$, the average number of observations in the correlation interval $\delta t$ (Schwarzenberg-Czerny 1991). Effectively, one should use $n_{\perp}/n_{\rm corr}$ and $\Theta/n_{\rm corr}$ instead of $n_{\perp}$ and $\Theta$ in calculating $P_1$. This result holds generally, for both least squares and maximum likelihood analyses of time series.

For independent observations, $m=2$ consecutive residuals have the same sign on average (e.g., Fisz 1963). Thus, counting the average length, $m$, of series of residuals of the same sign provides an estimate of the number of consecutive observations being correlated, $n_{\rm corr}$. Note that $m=n/l$ where $l$ is the number of such series (both positive and negative). For correlated observations, the average length of series with the same sign is $m=2n_{\rm corr}$, which allows us to calculate $n_{\rm corr}$.

Let $\Theta$ denote the Fisher-Snedecor statistics from the mhAoV periodogram (i.e. from Fourier series fit) computed for $n_{\|}=2N+1$ parameters, $n$ observations and $n_{\perp}=n-n_{\|}$ degrees of freedom. To account for $n_{\rm corr}$, we calculate $P_1$ as follows,
\begin{equation}
P_1=1- F\left(n_{\|},\frac{n_{\perp}}{n_{\rm
corr}};\frac{\Theta}{n_{\rm corr}}\right)=
I_{z}\left(\frac{n_{\perp}}{2n_{\rm
corr}},\frac{n_{\|}}{2}\right)\label{P1},
\end{equation}
where $ z= n_{\perp}/(n_{\perp}+n_{\|}\Theta)$ and $I_z(a,b)$ is the incomplete (regularized) beta function (Abramowitz & Stegun 1971), see Schwarzenberg-Czerny 1998 and references therein. In the popular Mathematica (Wolfram 1996) that function is called BetaRegularized. In Matlab, the following function does the calculations for us:

1
2
3
4
5
6
7
8
9
10
11
12
% Function computes the mhAoV periodogram peak significance
% Usage: [P1,Pneff]=pneff(n,nh,ncorr,neff,theta)
 
function [P1,Pneff]=pneff(n,nh,ncorr,neff,theta);
    npar=2.0*nh+1;
    nper=n-npar;
    z=nper/(nper+npar*theta);
    a=nper/(2.0*ncorr);
    b=npar/2.0;
    P1=betainc(z,a,b);  
    Pneff=1.0-(1.0-P1)^neff;
end

In the following section we will apply both approaches of modified Lomb-Scargle (LS) and multi-harmonic AoV periodograms for financial data and we will discuss the potential consequences coming from time analysis of largest daily losses for stocks traded publicly within S&P500 universe. So buckle up as we are ready for take off!

2. The Annual Migration Routes of Black Swans

A theory can both beautiful and exhausting. So let’s do some work to capture the beauty. Our goal is to re-analyze the data of extreme losses extracted by us previously within Black Swan and Extreme Loss Modeling article. First, we extract the value of the maximum loss for each stock and store them in a matrix in Matlab data as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% Modern Time Analysis of Black Swans among 
%  Traded Stocks in S&P 500
%
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
 
 
clear all; close all; clc;
tic;
 
%% DATA READING AND VISUALIZATION
 
% read a list of stock names
StockNames=dataread('file',['sp500u.lst'],'%s','delimiter', '\n');
K=length(StockNames); % the number of stocks in the universe
% path to data files
path=['./SP500_1984_2011'];
 
data=[];
fprintf('data reading and preprocessing..\n');
for si=1:K
    % --stock name
    stock=StockNames{si};
    fprintf('%4.0f  %7s\n',si,stock);
    % --load data
    n=[path,'/',stock,'.dat'];
    % check for NULL and change to NaN (using 'sed' command
    % in Unix/Linux/MacOS environment
    cmd=['sed -i ''s/NULL/NaN/g''',' ',n]; [status,result]=system(cmd);
    % construct FTS object for daily data
    FTS=ascii2fts(n,1,2);
    % fill any missing values denoted by NaNs
    FTS=fillts(FTS);
    % extract the close price of the stock
    cp=fts2mat(FTS.CLOSE,0);
    dd=fts2mat(FTS.CLOSE,1);
    % extract Matlab matrix containing value of maximum
    % loss per stock and corresponding day
    rtmp=cp(2:end)./cp(1:end-1)-1; % daily returns
    dtmp=dd(2:end,1); % time vector
    tmp{si}=[dtmp rtmp]; 
    [tmp1,tmp2]=min(tmp{si}(:,2)); % maximum loss 
    data=[data; dtmp(tmp2) tmp1];  % [time of maximum loss, loss value]
end
 
data=sortrows(data,1); % sort data according to time of loss occurrence

where, again, the required data files can be downloaded here as sp500u.zip (23.8 MB) and sp500u.lst, respectively.

The visualization of collected data provides us with a new dimension on time distribution of the maximum losses (Black Swans events) across the S&P 500 universe as traded between 3-Jan-1984 and 8-Mar-2011:

46
47
48
49
plot(data(:,1),data(:,2)*100,'.-','color',[0.7 0.7 0.7])
xlim([min(data(:,1)) max(data(:,1)) ]);
xlabel('Time (days)');
ylabel('R_{\rm min} (%)');

07042013_Fig1

All individual stock maximum losses have been depicted with dot markers. As we found within Gumbel distribution analysis, the expected value was -22.6% with a heavy tail extending up to nearly complete losses of $\sim$98%.

Changing the way how we look at our data, we allow to connect the dots and think about data as a new time-series $x_i\ (i=1,…,n=954)$. From this standpoint we can continue our analysis in various direction. Let’s have a look at one case in particular: annual average maximum losses in a function of time. Why? Such approach has been suggested as interesting by McNeil, Frey, and Embrechts in their book Quantitative Risk Management, section 7.1.4., making use of the block maxima method in order to find return levels for stress losses. We turn this idea in practice by rebinning our time-series $\{x_i\}$ with a new time step of 252 (trading) days utilizing the code published within my past post on Rebinning of Financial Time-Series as follows:

51
52
[rx,ry]=rebin(data(:,1),data(:,2),[],252);
hold on; plot(rx,ry*100,'or');

and allowing for inappropriate data profanity with a gentle data interpolation between the points:

54
55
56
57
xi=rx(1):0.01:rx(end);
rdatai=interp1(rx,ry,xi,'pchip');
rdatai=[xi; rdatai]';
hold on; plot(rdatai(:,1),rdatai(:,2)*100,'r-');

resulting in:

07042013_Fig2

Next, based on non-interpolated data we compute the Fourier power spectrum (a modified LS periodogram) as follows:

59
60
61
62
63
64
65
% Periodogram
[freq,pds]=qarPDS(rx,ry,252);
 
figure(2);
plot(freq,pds);
xlabel('Frequency [1/d]');
ylabel('Power Spectrum');

which returns:

07042013_Fig3

It is obvious that the periodogram is calculated based on a fixed frequency grid with a frequency step of $\Delta\nu = 1/T = 0.000104$ [1/d]. The peak of highest power corresponds to the sine modulation detected in the time-series which period is equal $1/0.001462$ or 684 days. The maximal allowed frequency is the Nyquist frequency of $1/(2\Delta t)$ or 0.00198 [1/d]. Honestly, the plot is terrible. To improve its quality it is allowed in spectral analysis of time-series to apply over-samling in frequency, i.e to adopt the frequency grid of computations with a step of $\Delta\nu = 1/(kT)$ where $k$ denotes the over-sampling (an integer) factor. Why do we need the over-sampling? One of the reasons is: to find the value of periodicity as accurately as possible.

Let’s see how mhAoV periodogram copes with this task in practice. The source codes for mhAoV can be downloaded directly from Alex’s webpage (available in Fortran 95 and Python), though I still make use of our old version executable directly in Unix/Linux environment: aov. Let’s first store rebinned data (with 252 d step) in an external file of rdata.dat:

67
68
69
70
71
rdata=[rx ry];
fn=['./rdata.dat'];
fid=fopen(fn,'wt');
fprintf(fid,'%f %f\n',rdata');
fclose(fid);

and, next, let’s compute aov periodogram:

./aov -f=0.000104,0.002 -nh=1 -fos=5 rdata.dat 
 
mhAov periodogram, by Alex Schwarzenberg-Czerny
ver. 27.9.2006 updated by Pawel Lachowicz
 
datname=rdata.dat trfname=rdata.trf maxname=rdata.max nobs=38
method=ORTAOV nh=1 nbf=20 fos=5.00
frstart=0.0001040 frend=0.0019809 frstep=0.0000107 nfr=176
  frequency    period     theta    quality
  0.0014747 678.1132747    5.53729   0.743
  0.0013152 760.3542866    1.39906   0.146
  0.0016301 613.4435376    1.37416   0.138
  0.0003351 2984.5922602   1.30742   0.116
  0.0001733 5771.4262041   1.22450   0.088
  0.0011538 866.7094426    1.12090   0.050

i.e. employing the model which contains a single sinusoid (nh=1) and adopting over-sampling in frequency with $k=5$ (fos). It occurs that the highest value of mhAoV $\Theta=5.537$ statistics corresponds to the period of 678 days.

Fitting the annual data with the model defined as:
$$
f(t) = c + A\sin(2\pi t/P – g\pi)
$$ we find for $P_1=678$ d the estimation for amplitude $A_1=0.12$ and $g_1=0.79$ and the best fit we over-plot as follows:

07042013_Fig4

This model is not perfect but delivers us a good taste of concealed periodic pattern following a sinusoidal model in about 60% of time between 1984 to 2011. This is an interesting result though the computation of:

>> [P1,Pneff]=pneff(38,1,1.3,7.1,5.53729)
 
P1 =
    0.0138
Pneff =
    0.0941

indicates at only 9% of significance of this periodicity. This can be understood as a poor fit of the model to the complicated and variable shape of annual changes in maximal losses for different traded stocks.

A sort of improvement of the model we could achieve by inclusion of variation of the amplitude in function of time, i.e. $A_1(t)$. This can be practically extracted from the wavelet analysis via computation of the continuous wavelet transform. If this is subject of your interest check a draft of this approach in the paper of Czerny et alii (2010) I co-authored.

3. Conclusions

Was Joseph Fourier a birdwatcher? We don’t know. But his approach to the time-series analysis allowed us to check whether any periodic patterns in annual migration (occurrences) of Black Swan events do exist? With a very low probability we found a cyclical trend repeating every 678 days. Will that allow us to forecast the future and next density of massive losses as incurred by individual stocks? Well, now, equipped with power tools of modern approach to time analysis, we can always come back and verify our hypotheses.

References

The theory on the methods of period detection based on publication of Lachowicz et alii (2006). For deeper references to source readings mentioned in the text, check the reference section inside the aforementioned publication.

Black Swan and Extreme Loss Modeling

When I read the book of Nassim Nicholas Taleb Black Swan my mind was captured by the beauty of extremely rare events and, concurrently, devastated by the message the book sent: the non-computability of the probability of the consequential rare events using scientific methods (as owed to the very nature of small probabilities). I rushed to the local library to find out what had been written on the subject. Surprisingly I discovered the book of Embrechts, Kluppelberg & Mikosh on Modelling Extremal Events for Insurance and Finance which appeared to me very inaccessible, loaded with heavy mathematical theorems and proofs, and a negligible number of practical examples. I left it on the shelf to dust for a longer while until last week when a fresh drive to decompose the problem came back to me again.

In this article I will try to take you for a short but efficient journey through the part of the classical extreme value theory, namely, fluctuations of maxima, and fulfil the story with easy-to-follow procedures on how one may run simulations of the occurrences of extreme rare losses in financial return series. Having this experience, I will discuss briefly the inclusion of resulting modeling to the future stock returns.

1. The Theory of Fluctuations of Maxima

Let’s imagine we have a rich historical data (time-series) of returns for a specific financial asset or portfolio of assets. A good and easy example is the daily rate of returns, $R_i$, for a stock traded e.g. at NASDAQ Stock Market,
$$
R_t = \frac{P_t}{P_{t-1}} – 1 \ ,
$$ where $P_t$ and $P_{t-1}$ denote a stock price on a day $t$ and $t-1$, respectively. The longer time coverage the more valuable information can be extracted. Given the time-series of daily stock returns, $\{R_i\}\ (i=1,…,N)$, we can create a histogram, i.e. the distribution of returns. By the rare event or, more precisely here, the rare loss we will refer to the returns placed in the far left tail of the distribution. As an assumption we also agree to $R_1,R_2,…$ to be the sequence of iid non-degenerate rvs (random variables) with a common df $F$ (distribution function of $F$). We define the fluctuations of the sample maxima as:
$$
M_1 = R_1, \ \ \ M_n = \max(R_1,…,R_n) \ \mbox{for}\ \ n\ge 2 \ .
$$ That simply says that for any time-series $\{R_i\}$, there is one maximum corresponding to the rv (random variable) with the most extreme value. Since the main line of this post is the investigation of maximum losses in return time-series, we are eligible to think about negative value (losses) in terms of maxima (therefore conduct the theoretical understanding) thanks to the identity:
$$
\min(R_1,…,R_n) = -\max(-R_1,…,-R_n) \ .
$$ The distribution function of maximum $M_n$ is given as:
$$
P(M_n\le x) = P(R_1\le x, …, R_n\le x) = P(R_1\le x)\cdots P(R_n\le x) = F^n(x)
$$ for $x\in\Re$ and $n\in\mbox{N}$.

What the extreme value theory first ‘investigates’ are the limit laws for the maxima $M_n$. The important question here emerges: is there somewhere out there any distribution which satisfies for all $n\ge 2$ the identity in law
$$
\max(R_1,…,R_n) = c_nR + d_n
$$ for appropriate constants $c_n>0$ and $d_n\in\Re$, or simply speaking, which classes of distributions $F$ are closed for maxima? The theory defines next the max-stable distribution within which a random variable $R$ is called max-stable if it satisfies a aforegoing relation for idd $R_1,…,R_n$. If we assume that $\{R_i\}$ is the sequence of idd max-stable rvs then:
$$
R = c_n^{-1}(M_n-d_n)
$$ and one can say that every max-stable distribution is a limit distribution for maxima of idd rvs. That brings us to the fundamental Fisher-Trippett theorem saying that if there exist constants $c_n>0$ and $d_n\in\Re$ such that:
$$
c_n^{-1}(M_n-d_n) \rightarrow H, \ \ n\rightarrow\infty\ ,
$$ then $H$ must be of the type of one of the three so-called standard extreme value distributions, namely: Fréchet, Weibull, and Gumbel. In this post we will be only considering the Gumbel distribution $G$ of the corresponding probability density function (pdf) $g$ given as:
$$
G(z;\ a,b) = e^{-e^{-z}} \ \ \mbox{for}\ \ z=\frac{x-b}{a}, \ x\in\Re
$$ and
$$
g(z;\ a,b) = b^{-1} e^{-z}e^{-e^{-z}} \ .
$$ where $a$ and $b$ are the location parameter and scale parameter, respectively. Having defined the extreme value distribution and being now equipped with a better understanding of theory, we are ready for a test drive over daily roads of profits and losses in the trading markets. This is the moment which separates men from boys.

2. Gumbel Extreme Value Distribution for S&P500 Universe


As usual, we start with entrée. Our goal is to find the empirical distribution of maxima (i.e. maximal daily losses) for all stocks belonging to the S&P500 universe between 3-Jan-1984 and 8-Mar-2011. There were $K=954$ stocks traded within this period and their data can be downloaded here as a sp500u.zip file (23.8 MB). The full list of stocks’ names is provided in sp500u.lst file. Therefore, performing the data processing in Matlab, first we need to compute a vector storing daily returns for each stock, and next find the corresponding minimal value $M_n$ where $n$ stands for the length of each return vector:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
% Black Swan and Extreme Loss Modeling
%  using Gumbel distribution and S&P500 universe
%
% (c) 2013 QuantAtRisk, by Pawel Lachowicz
 
 
clear all; close all; clc;
tic;
 
%% DATA READING AND PREPROCESSING
 
% read a list of stock names
StockNames=dataread('file',['sp500u.lst'],'%s','delimiter', '\n');
K=length(StockNames); % the number of stocks in the universe
% path to data files
path=['./SP500u'];
 
fprintf('data reading and preprocessing..\n');
for si=1:K
    % --stock name
    stock=StockNames{si};
    fprintf('%4.0f  %7s\n',si,stock);
    % --load data
    n=[path,'/',stock,'.dat'];
    % check for NULL and change to NaN (using 'sed' command
    % in Unix/Linux/MacOS environment)
    cmd=['sed -i ''s/NULL/NaN/g''',' ',n]; [status,result]=system(cmd);
    % construct FTS object for daily data
    FTS=ascii2fts(n,1,2);
    % fill any missing values denoted by NaNs
    FTS=fillts(FTS);
    % extract the close price of the stock
    cp=fts2mat(FTS.CLOSE,0);
    % calculate a vector with daily stock returns and store it in
    % the cell array
    R{si}=cp(2:end)./cp(1:end-1)-1;
end
 
%% ANALYSIS
 
% find the minimum daily return value for each stock
Rmin=[];
for si=1:K
    Mn=min(R{si},[],1);
    Rmin=[Rmin; Mn];
end

Having that ready, we fit the data with the Gumbel function which (as we believe) would describe the distribution of maximal losses in the S&P500 universe best:

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
% fit the empirical distribution with Gumbel distribution and 
% estimate the location, a, and scale, b, parameter
[par,parci]=evfit(Rmin);
a=par(1); 
b=par(2);
 
% plot the distribution
x=-1:0.01:1;
hist(Rmin,-1:0.01:0);
h=findobj(gca,'Type','patch');
set(h,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.6 0.6 0.6]);
h=findobj(gca,'Type','box');
set(h,'Color','k');
 
% add a plot of Gumbel pdf 
pdf1=evpdf(x,a,b);
y=0.01*length(Rmin)*pdf1;
line(x,y,'color','r'); box on;
xlabel('R_{min}');
ylabel(['f(R_{min}|a,b)']);
text(-1,140,['a = ',num2str(paramEstsMinima(1),3)]);
text(-1,130,['b = ',num2str(paramEstsMinima(2),3)]); 
xlim([-1 0]);

The maximum likelihood estimates of the parameters $a$ and $b$ and corresponding 95% confidence intervals we can find as follows:

>> [par,parci]=evfit(Rmin)
 
par =
   -0.2265    0.1135
 
parci =
   -0.2340    0.1076
   -0.2190    0.1197

That brings us to a visual representation of our analysis:

gumbel

This is a very important result communicating that the expected value of extreme daily loss is equal about -22.6%. However, the left tail of the fitted Gumbel distribution extends far up to nearly -98% although the probability of the occurrence of such a massive daily loss is rather low.

On the other hand, the expected value of -22.6% is surprisingly close to the trading down-movements in the markets on Oct 19, 1987 known as Black Monday when Dow Jones Industrial Average (DJIA) dropped by 508 points to 1738.74, i.e. by 22.61%!

3. Blending Extreme Loss Model with Daily Returns of a Stock

Probably you wonder how can we include the results coming from the Gumbel modeling for the prediction of rare losses in the future daily returns of a particular stock. This can be pretty straightforwardly done combining the best fitted model (pdf) for extreme losses with stock’s pdf. To do it properly we need to employ the concept of the mixture distributions. Michael B. Miller in his book Mathematics and Statistics for Financial Risk Management provides with a clear idea of this procedure. In our case, the mixture density function $f(x)$ could be denoted as:
$$
f(x) = w_1 g(x) + (1-w_1) n(x)
$$ where $g(x)$ is the Gumbel pdf, $n(x)$ represents fitted stock pdf, and $w_1$ marks the weight (influence) of $g(x)$ into resulting overall pdf.

In order to illustrate this process, let’s select one stock from our S&P500 universe, say Apple Inc. (NASDAQ: AAPL), and fit its daily returns with a normal distribution:

72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
% AAPL daily returns (3-Jan-1984 to 11-Mar-2011)
rs=R{18};
figure(2);
hist(rs,50);
h=findobj(gca,'Type','patch');
set(h,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.6 0.6 0.6]);
h=findobj(gca,'Type','box');
set(h,'Color','k');
% fit the normal distribution and plot the fit
[muhat,sigmahat]=normfit(rs)
x =-1:0.01:1;
pdf2=normpdf(x,muhat,sigmahat);
y=0.01*length(rs)*pdf2;
hold on; line(x,y,'color','r');
xlim([-0.2 0.2]); ylim([0 2100]);
xlabel('R');

aapl

where the red line represents the fit with a mean of $\mu=0.0012$ and a standard deviation $\sigma=0.0308$.

We can obtain the mixture distribution $f(x)$ executing a few more lines of code:

89
90
91
92
93
94
95
96
% Mixture Distribution Plot
figure(3);
w1= % enter your favorite value, e.g. 0.001
w2=1-w1;
pdfmix=w1*(pdf1*0.01)+w2*(pdf2*0.01);  % note: sum(pdfmix)=1 as expected
x=-1:0.01:1;
plot(x,pdfmix);
xlim([-0.6 0.6]);

It is important to note that our modeling is based on $w_1$ parameter. It can be intuitively understood as follows. Let’s say that we choose $w_1=0.01$. That would mean that Gumbel pdf contributes to the overall pdf in 1%. In the following section we will see that if a random variable is drawn from the distribution given by $f(x)$, $w_1=0.01$ simply means (not exactly but with a sufficient approximation) that there is 99% of chances of drawing this variable from $n(x)$ and only 1% from $g(x)$. The dependence of $f(x)$ on $w_1$ illustrates the next figure:

w1

It is well visible that a selection of $w_1>0.01$ would be a significant contributor to the left tail making it fat. This is not the case what is observed in the empirical distribution of daily returns for AAPL (and in general for majority of stocks), therefore we rather expect $w_1$ to be much much smaller than 1%.

4. Drawing Random Variables from Mixture Distribution

A short break between entrée and main course we fill with a sip of red wine. Having discrete form of $f(x)$ we would like to be able to draw a random variable from this distribution. Again, this is easy too. Following a general recipe, for instance given in the Chapter 12.2.2 of Philippe Jorion’s book Value at Risk: The New Benchmark for Managing Financial Risk, we wish to use the concept of inverse transform method. In first step we use the output (a random variable) coming from a pseudo-random generator drawing its rvs based on the uniform distribution $U(x)$. This rv is always between 0 and 1, and in the last step is projected on the cumulative distribution of our interest $F(x)$, what in our case would correspond to the cumulative distribution for $f(x)$ pdf. Finally, we read out the corresponding value on the x-axis: a rv drawn from $f(x)$ pdf. Philippe illustrates that procedure more intuitively:

Drawing a Random Variable Process

This methods works smoothly when we know the analytical form of $F(x)$. However, if this not in the menu, we need to use a couple of technical skills. First, we calculate $F(x)$ based on $f(x)$. Next, we set a very fine grid for $x$ domain, and we perform interpolation between given data points of $F(x)$.

98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
% find cumulative pdf, F(x)
figure(4);
s=0;
x=-1;
F=[];
for i=1:length(pdfmix);
    s=s+pdfmix(i);
    F=[F; x s];
    x=x+0.01;
end
plot(F(:,1),F(:,2),'k')
xlim([-1 1]); ylim([-0.1 1.1]);
% perform interpolation of cumulative pdf using very fine grid
xi=(-1:0.0000001:1);
yi=interp1(F(:,1),F(:,2),xi,'linear'); % use linear interpolation method
hold on; plot(xi,yi);

The second sort of difficulty is in finding a good match between the rv drawn from the uniform distribution and approximated value for our $F(x)$. That is why a very fine grid is required supplemented with some matching techniques. The following code that I wrote deals with this problem pretty efficiently:

115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
% draw a random variable from f(x) pdf: xi(row)
tF2=round((round(100000*yi')/100000)*100000);
RV=[];
for k=1:(252*40)
    notok=false;
    while(~notok)
        tU=round((round(100000*rand)/100000)*100000);
        [r,c,v]=find(tF2==tU);
        if(~isempty(r))
            notok=true;
        end
    end
    if(length(r)>1)
        rv=round(2+(length(r)-2)*rand);
        row=r(rv); 
    else
        row=r(1);
    end
    % therefore, xi(row) is a number represting a rv
    % drawn from f(x) pdf; we store 252*40 of those
    % new rvs in the following matrix:
    RV=[RV; xi(row) yi(row)];
end
% mark all corresponding rvs on the cumulative pdf
hold on; plot(RV(:,1),RV(:,2),'rx');

Finally, as the main course we get and verify the distribution of a large number of new rvs drawn from $f(x)$ pdf. It is crucial to check whether our generating algorithm provides us with a uniform coverage across the entire $F(x)$ plot,

Fx

where, in order to get more reliable (statistically) results, we generate 10080 rvs which correspond to the simulated 1-day stock returns for 252 trading days times 40 years.

5. Black Swan Detection

A -22% collapse in the markets on Oct 19, 1978 served as a day when the name of Black Swan event took its birth or at least had been reinforced in the financial community. Are black swans extremely rare? It depends. If you live for example in Perth, Western Australia, you can see a lot of them wandering around. So what defines the extremely rare loss in the sense of financial event? Let’s assume by the definition that by Black Swan event we will understand of a daily loss of 20% or more. If so, using the procedure described in this post, we are tempted to pass from the main course to dessert.

Our modeling concentrates around finding the most proper contribution of $w_1g(x)$ to resulting $f(x)$ pdf. As an outcome of a few runs of Monte Carlo simulations with different values of $w_1$ we find that for $w_1=[0.0010,0.0005,0.0001]$ we detect in the simulations respectively 9, 5, and 2 events (rvs) displaying a one-day loss of 20% or more.

Therefore, the simulated daily returns for AAPL, assuming $w_1=0.0001$, generate in the distribution two Black Swan events, i.e. one event per 5040 trading days, or one per 20 years:

Black Swans in future AAPL returns

That result agrees quite well with what has been observed so far, i.e. including Black Monday in 1978 and Flash Crash in intra-day trading on May 6, 2010 for some of the stocks.

Acknowledgements

I am grateful to Peter Urbani from New Zealand for directing my attention towards Gumbel distribution for modeling very rare events.

Contact Form Powered By : XYZScripts.com