Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Computation of the Loss Distribution not only for Operational Risk Managers

In the Operational Risk Management, given a number/type of risks or/and business line combinations, the quest is all about providing the risk management board with an estimation of the losses the bank (or any other financial institution, hedge-fund, etc.) can suffer from. If you think for a second, the spectrum of things that might go wrong is wide, e.g. the failure of a computer system, an internal or external fraud, clients, products, business practices, a damage to physical goods, and so on. These ones blended with business lines, e.g. corporate finance, trading and sales, retail banking, commercial banking, payment and settlement, agency services, asset management, or retail brokerage return over 50 combinations of the operational risk factors one needs to consider. Separately and carefully. And it’s a tough one.

Why? A good question “why?”! Simply because of two main reasons. For an operational risk manager the sample of data describing the risk is usually insufficient (statistically speaking: the sample is small over the life period of the financial organ). Secondly, when something goes wrong, the next (of the same kind) event may take place in not-to-distant future or in far-distant future. The biggest problem the operational risk manager meets in his/her daily work regards the prediction of all/any losses due to operational failures. Therefore, the time of the (next) event comes in as an independent variable into that equation: the loss frequency distribution. The second player in the game is: the loss severity distribution, i.e. if the worst strikes, how much the bank/financial body/an investor/a trader might lose?!

From a perspective of a trader we well know that Value-at-Risk (VaR) and the Expected Shortfall are two quantitative risk measures that address similar questions. But from the viewpoint of the operational risk, the estimation of losses requires a different approach.

In this post, after Hull (2015), we present an algorithm in Python for computation of the loss distribution given the best estimation of the loss frequency and loss severity distributions. Though designed for operation risk analysts in mind, in the end we argue its usefulness for any algo-trader and/or portfolio risk manager.

1. Operational Losses: Case Study of the Vanderloo Bank

An access to operational loss data is much much harder than in case of stocks traded in the exchange. They usually stay within the walls of the bank, with an internal access only. A recommended practice for operational risk managers around the world is to share those unique data despite confidentiality. Only in that instance we can build a broader knowledge and understanding of risks and incurred losses due to operational activities.

Let’s consider a case study of a hypothetical Vanderloo Bank. The bank had been found in 1988 in Netherlands and its main line of business was concentrated around building unique customer relationships and loans for small local businesses. Despite a vivid vision and firmly set goals for the future, Vanderloo Bank could not avoid a number of operational roadblocks that led to a substantial operational losses:

Year Month Day Business Line Risk Category Loss ($M) 0 1989.0 1.0 13.0 Trading and Sales Internal Fraud 0.530597 1 1989.0 2.0 9.0 Retail Brokerage Process Failure 0.726702 2 1989.0 4.0 14.0 Trading and Sales System Failure 1.261619 3 1989.0 6.0 11.0 Asset Managment Process Failure 1.642279 4 1989.0 7.0 23.0 Corporate Finance Process Failure 1.094545 5 1990.0 10.0 21.0 Trading and Sales Employment Practices 0.562122 6 1990.0 12.0 24.0 Payment and Settlement Process Failure 4.009160 7 1991.0 8.0 23.0 Asset Managment Business Practices 0.495025 8 1992.0 1.0 28.0 Asset Managment Business Practices 0.857785 9 1992.0 3.0 14.0 Commercial Banking Damage to Assets 1.257536 10 1992.0 5.0 26.0 Retail Banking Internal Fraud 1.591007 11 1992.0 8.0 9.0 Corporate Finance Employment Practices 0.847832 12 1993.0 1.0 11.0 Corporate Finance System Failure 1.314225 13 1993.0 1.0 19.0 Retail Banking Internal Fraud 0.882371 14 1993.0 2.0 24.0 Retail Banking Internal Fraud 1.213686 15 1993.0 6.0 12.0 Commercial Banking System Failure 1.231784 16 1993.0 6.0 16.0 Agency Services Damage to Assets 1.316528 17 1993.0 7.0 11.0 Retail Banking Process Failure 0.834648 18 1993.0 9.0 21.0 Retail Brokerage Process Failure 0.541243 19 1993.0 11.0 11.0 Asset Managment Internal Fraud 1.380636 20 1994.0 11.0 22.0 Retail Banking External Fraud 1.426433 21 1995.0 2.0 14.0 Commercial Banking Process Failure 1.051281 22 1995.0 11.0 21.0 Commercial Banking External Fraud 2.654861 23 1996.0 8.0 17.0 Agency Services Process Failure 0.837237 24 1997.0 7.0 13.0 Retail Brokerage Internal Fraud 1.107019 25 1997.0 7.0 24.0 Agency Services External Fraud 1.513146 26 1997.0 8.0 8.0 Retail Banking Process Failure 1.002040 27 1997.0 9.0 2.0 Agency Services Damage to Assets 0.646596 28 1997.0 9.0 12.0 Retail Banking Employment Practices 0.966086 29 1998.0 1.0 8.0 Retail Banking Internal Fraud 0.938803 30 1998.0 1.0 12.0 Retail Banking System Failure 0.922069 31 1998.0 2.0 5.0 Asset Managment Process Failure 1.042259 32 1998.0 4.0 18.0 Commercial Banking External Fraud 0.969562 33 1998.0 5.0 12.0 Retail Banking External Fraud 0.683715 34 1999.0 1.0 3.0 Trading and Sales Internal Fraud 2.035785 35 1999.0 4.0 27.0 Retail Brokerage Business Practices 1.074277 36 1999.0 5.0 8.0 Retail Banking Employment Practices 0.667655 37 1999.0 7.0 10.0 Agency Services System Failure 0.499982 38 1999.0 7.0 17.0 Retail Brokerage Process Failure 0.803826 39 2000.0 1.0 26.0 Commercial Banking Business Practices 0.714091 40 2000.0 7.0 23.0 Trading and Sales System Failure 1.479367 41 2001.0 6.0 16.0 Retail Brokerage System Failure 1.233686 42 2001.0 11.0 5.0 Agency Services Process Failure 0.926593 43 2002.0 5.0 14.0 Payment and Settlement Damage to Assets 1.321291 44 2002.0 11.0 11.0 Retail Banking External Fraud 1.830254 45 2003.0 1.0 14.0 Corporate Finance System Failure 1.056228 46 2003.0 1.0 28.0 Asset Managment System Failure 1.684986 47 2003.0 2.0 28.0 Commercial Banking Damage to Assets 0.680675 48 2004.0 1.0 11.0 Asset Managment Process Failure 0.559822 49 2004.0 6.0 19.0 Commercial Banking Internal Fraud 1.388681 50 2004.0 7.0 3.0 Retail Banking Internal Fraud 0.886769 51 2004.0 7.0 21.0 Retail Brokerage Employment Practices 0.606049 52 2004.0 7.0 27.0 Asset Managment Employment Practices 1.634348 53 2004.0 11.0 26.0 Asset Managment Damage to Assets 0.983355 54 2005.0 1.0 9.0 Corporate Finance Damage to Assets 0.969710 55 2005.0 9.0 17.0 Commercial Banking System Failure 0.634609 56 2006.0 2.0 24.0 Agency Services Business Practices 0.637760 57 2006.0 3.0 21.0 Retail Banking Employment Practices 1.072489 58 2006.0 6.0 25.0 Payment and Settlement System Failure 0.896459 59 2006.0 12.0 25.0 Trading and Sales Process Failure 0.731953 60 2007.0 6.0 9.0 Commercial Banking System Failure 0.918233 61 2008.0 1.0 5.0 Corporate Finance External Fraud 0.929702 62 2008.0 2.0 14.0 Retail Brokerage System Failure 0.640201 63 2008.0 2.0 14.0 Commercial Banking Internal Fraud 1.580574 64 2008.0 3.0 18.0 Corporate Finance Process Failure 0.731046 65 2009.0 2.0 1.0 Agency Services System Failure 0.630870 66 2009.0 2.0 6.0 Retail Banking External Fraud 0.639761 67 2009.0 4.0 14.0 Payment and Settlement Internal Fraud 1.022987 68 2009.0 5.0 25.0 Retail Banking Business Practices 1.415880 69 2009.0 7.0 8.0 Retail Banking Business Practices 0.906526 70 2009.0 12.0 26.0 Agency Services System Failure 1.463529 71 2010.0 2.0 13.0 Asset Managment Damage to Assets 0.664935 72 2010.0 3.0 24.0 Payment and Settlement Process Failure 1.848318 73 2010.0 10.0 16.0 Commercial Banking External Fraud 1.020736 74 2010.0 12.0 27.0 Retail Banking Employment Practices 1.126265 75 2011.0 2.0 5.0 Retail Brokerage Process Failure 1.549890 76 2011.0 6.0 24.0 Corporate Finance Damage to Assets 2.153238 77 2011.0 11.0 6.0 Asset Managment System Failure 0.601332 78 2011.0 12.0 1.0 Payment and Settlement External Fraud 0.551183 79 2012.0 2.0 21.0 Corporate Finance External Fraud 1.866740 80 2013.0 4.0 22.0 Retail Brokerage External Fraud 0.672756 81 2013.0 6.0 27.0 Payment and Settlement Employment Practices 1.119233 82 2013.0 8.0 17.0 Commercial Banking System Failure 1.034078 83 2014.0 3.0 1.0 Asset Managment Employment Practices 2.099957 84 2014.0 4.0 4.0 Retail Brokerage External Fraud 0.929928 85 2014.0 6.0 5.0 Retail Banking System Failure 1.399936 86 2014.0 11.0 17.0 Asset Managment Process Failure 1.299063 87 2014.0 12.0 3.0 Agency Services System Failure 1.787205 88 2015.0 2.0 2.0 Payment and Settlement System Failure 0.742544 89 2015.0 6.0 23.0 Commercial Banking Employment Practices 2.139426 90 2015.0 7.0 18.0 Trading and Sales System Failure 0.499308 91 2015.0 9.0 9.0 Retail Banking Employment Practices 1.320201 92 2015.0 9.0 18.0 Corporate Finance Business Practices 2.901466 93 2015.0 10.0 21.0 Commercial Banking Internal Fraud 0.808329 94 2016.0 1.0 9.0 Retail Banking Internal Fraud 1.314893 95 2016.0 3.0 28.0 Asset Managment Business Practices 0.702811 96 2016.0 3.0 25.0 Payment and Settlement Internal Fraud 0.840262 97 2016.0 4.0 6.0 Retail Banking Process Failure 0.465896 Having a record of 97 events, now we can begin building a statistical picture on loss frequency and loss severity distribution. 2. Loss Frequency Distribution For loss frequency, the natural probability distribution to use is a Poisson distribution. It assumes that losses happen randomly through time so that in any short period of time$\Delta t$there is a probability of$\lambda \Delta t$of a loss occurring. The probability of$n$losses in time$T$[years] is: $$\mbox{Pr} = \exp{(-\lambda T)} \frac{(\lambda T)^n}{n!}$$ where the parameter$\lambda$can be estimated as the average number of losses per year (Hull 2015). Given our table in the Python pandas’ DataFrame format, df, we code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Computation of the Loss Distribution not only for Operational Risk Managers # (c) 2016 QuantAtRisk.com, Pawel Lachowicz from scipy.stats import lognorm, norm, poisson from matplotlib import pyplot as plt import numpy as np import pandas as pd # reading Vanderoo Bank operational loss data df = pd.read_hdf('vanderloo.h5', 'df') # count the number of loss events in given year fre = df.groupby("Year").size() print(fre) where the last operation groups and displays the number of losses in each year: Year 1989.0 5 1990.0 2 1991.0 1 1992.0 4 1993.0 8 1994.0 1 1995.0 2 1996.0 1 1997.0 5 1998.0 5 1999.0 5 2000.0 2 2001.0 2 2002.0 2 2003.0 3 2004.0 6 2005.0 2 2006.0 4 2007.0 1 2008.0 4 2009.0 6 2010.0 4 2011.0 4 2012.0 1 2013.0 3 2014.0 5 2015.0 6 2016.0 4 dtype: int64 The estimation of Poisson’s$\lambda$requires solely the computation of: 16 17 18 # estimate lambda parameter lam = np.sum(fre.values) / (df.Year[df.shape[0]-1] - df.Year[0]) print(lam) 3.62962962963 what informs us that during 1989–2016 period, i.e. over the past 27 years, there were$\lambda = 3.6losses per year. Assuming Poisson distribution as the best descriptor for loss frequency distribution, we model the probability of operational losses of the Vanderloo Bank in the following way: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # draw random variables from a Poisson distribtion with lambda=lam prvs = poisson.rvs(lam, size=(10000)) # plot the pdf (loss frequency distribution) h = plt.hist(prvs, bins=range(0, 11)) plt.close("all") y = h[0]/np.sum(h[0]) x = h[1] plt.figure(figsize=(10, 6)) plt.bar(x[:-1], y, width=0.7, align='center', color="#2c97f1") plt.xlim([-1, 11]) plt.ylim([0, 0.25]) plt.ylabel("Probability", fontsize=12) plt.title("Loss Frequency Distribution", fontsize=14) plt.savefig("f01.png") revealing: 3. Loss Severity Distribution The data collected in the last column ofdf$allow us to plot and estimate the best fit of the loss severity distribution. In the practice of operational risk mangers, the lognormal distribution is a common choice: 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 c = .7, .7, .7 # define grey color plt.figure(figsize=(10, 6)) plt.hist(df["Loss ($M)"], bins=25, color=c, normed=True) plt.xlabel("Incurred Loss ($M)", fontsize=12) plt.ylabel("N", fontsize=12) plt.title("Loss Severity Distribution", fontsize=14) x = np.arange(0, 5, 0.01) sig, loc, scale = lognorm.fit(df["Loss ($M)"]) pdf = lognorm.pdf(x, sig, loc=loc, scale=scale) plt.plot(x, pdf, 'r') plt.savefig("f02.png")   print(sig, loc, scale) # lognormal pdf's parameters
0.661153638163 0.328566816132 0.647817560825

where the lognormal distribution probability density function (pdf) we use is given by:
$$p(x; \sigma, loc, scale) = \frac{1}{x\sigma\sqrt{2\pi}} \exp{ \left[ -\frac{1}{2} \left(\frac{\log{x}}{\sigma} \right)^2 \right] }$$
where $x = (y – loc)/scale$. The fit of pdf to the data returns:

4. Loss Distribution

The loss frequency distribution must be combined with the loss severity distribution for each risk type/business line combination in order to determine a loss distribution. The most common assumption here is that loss severity is independent of loss frequency. Hull (2015) suggests the following steps to be taken in building the Monte Carlo simulation leading to modelling of the loss distribution:

1. Sample from the frequency distribution to determine the number of loss events ($n$)
2. Sample $n$ times from the loss severity distribution to determine the loss experienced
for each loss event ($L_1, L_2, …, L_n$)
3. Determine the total loss experienced ($=L_1 + L_2 + … + L_n$)

When many simulation trials are used, we obtain a total distribution for losses of the type being considered. In Python we code those steps in the following way:

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 def loss(r, loc, sig, scale, lam): X = [] for x in range(11): # up to 10 loss events considered if(r < poisson.cdf(x, lam)): # x denotes a loss number out = 0 else: out = lognorm.rvs(s=sig, loc=loc, scale=scale) X.append(out) return np.sum(X) # = L_1 + L_2 + ... + L_n     # run 1e5 Monte Carlo simulations losses = [] for _ in range(100000): r = np.random.random() losses.append(loss(r, loc, sig, scale, lam))     h = plt.hist(losses, bins=range(0, 16)) _ = plt.close("all") y = h[0]/np.sum(h[0]) x = h[1]   plt.figure(figsize=(10, 6)) plt.bar(x[:-1], y, width=0.7, align='center', color="#ff5a19") plt.xlim([-1, 16]) plt.ylim([0, 0.20]) plt.title("Modelled Loss Distribution", fontsize=14) plt.xlabel("Loss ($M)", fontsize=12) plt.ylabel("Probability of Loss", fontsize=12) plt.savefig("f03.png") revealing: The function of loss has been designed in the way that it considers up to 10 loss events. We run$10^5$simulations. In each trial, first, we draw a random number r from a uniform distribution. If it is less than a value of Poisson cumulative distribution function (with$\lambda = 3.6$) for x loss number ($x = 0, 1, …, 10\$) then we assume a zero loss incurred. Otherwise, we draw a rv from the lognormal distribution (given by its parameters found via fitting procedure a few lines earlier). Simple as that.

The resultant loss distribution as shown in the chart above describes the expected severity of future losses (due to operational “fatal” activities of Vanderloo Bank) given by the corresponding probabilities.

5. Beyond Operational Risk Management

A natural step of the numerical procedure which we have applied here seems to pertain to the modelling of, e.g., the anticipated (predicted) loss distribution for any portfolio of N-assets. One can estimate it based on the track record of losses incurred in trading as up-to-date. By doing so, we gain an additional tool in our arsenal of quantitative risk measures and modelling. Stay tuned as a new post will illustrate that case.