Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Python for Algo-Trading: Workshops in Poland

QuantAtRisk.com would love to invite You to attend

where we will cover all essential aspects of algo-trading covering Warsaw Stock Exchange and selected EU/UK/US exchanges + FOREX markets.

Organised and Led by
Dr. Pawel Lachowicz (Sydney, Australia)

## Recovery of Financial Price-Series based on Daily Returns Matrix in Python

Lesson 10>>

As a financial analyst or algo trader, you are so often faced with information on, inter alia, daily asset trading in a form of a daily returns matrix. In many cases, it is easier to operate with the return-series rather than with price-series. And there are excellent reasons standing behind such decision, e.g. the possibility to plot the histogram of daily returns, the calculation of daily Value-at-Risk (Var), etc.

When you use Python (not Matlab), the recovery of price-series for return-series may be a bit of challenge, especially when you face the problem for the first time. A technical problem, i.e. “how to do it?!” within Python, requires you to switch your thinking mode and adjust your vantage point from Matlab-ish to Pythonic. Therefore, let’s see what is the best recipe to turn your world upside down?!

Say, we start with a $N$-asset portfolio of $N$ assets traded for $L+1$ last days. It will require the use of the Python’s NumPy arrays. We begin with:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import numpy as np import matplotlib.pyplot as plt   np.random.seed(2014)   # define portfolio N = 5 # number of assets L = 4 # number of days   # asset close prices p = np.random.randint(10, 30, size=(N, 1)) + \ np.random.randn(N, 1) # a mixture of uniform and N(0,1) rvs print(p) print(p.shape) print()

where we specify a number of assets in portfolio and a number of days. $L = 4$ has been selected for the clarity of printing of the outcomes below, however, feel free to increase that number (or both) anytime you rerun this code.

Next, we create a matrix ($N\times 1$) with a starting random prices for all $N$ assets to be between \$10 and \$30 (random integer) supplemented by (0, 1) fractional part. Printing p returns:

[[ 25.86301396] [ 19.82072772] [ 22.33569347] [ 21.38584671] [ 24.56983489]] (5, 1)

Now, let’s generate a matrix of random daily returns over next 4 days for all 5 assets:

17 18 19 r = np.random.randn(N, L)/50 print(r) print(r.shape)

delivering:

[[ 0.01680965 -0.00620443 -0.02876535 -0.03946471] [-0.00467748 -0.0013034 0.02112921 0.01095789] [-0.01868982 -0.01764086 0.01275301 0.00858922] [ 0.01287237 -0.00137129 -0.0135271 0.0080953 ] [-0.00615219 -0.03538243 0.01031361 0.00642684]] (5, 4)

Having that, our wish is, for each asset, take its first close-price value from the p array and using information on daily returns stored row-by-row (i.e. asset per asset) in the r array, reconstruct the close-price asset time-series:

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 for i in range(r.shape[0]): tmp = [] for j in range(r.shape[1]): if(j == 0): tmp.append(p[i][0].tolist()) y = p[i] * (1 + r[i][j]) else: y = y * (1 + r[i][j]) tmp.append(y.tolist()[0])   if(i == 0): P = np.array(tmp) else: P = np.vstack([P, np.array(tmp)])   print() print(P) print(P.shape) print()

That returns:

[[ 25.86301396 26.29776209 26.13459959 25.38282873 24.38110263] [ 19.82072772 19.72801677 19.70230331 20.11859739 20.33905475] [ 22.33569347 21.91824338 21.53158666 21.8061791 21.99347727] [ 21.38584671 21.66113324 21.63142965 21.33881915 21.51156325] [ 24.56983489 24.41867671 23.55468454 23.79761839 23.95056194]] (5, 5)

Thus, we have two loops: the outer one over rows/assets (index i) and inner one over columns/days (index j). For j = 0 we copy the price of the asset from p as a “starting close price”, e.g. on the first day. Concurrently, using the first information from r matrix we compute a change in price on the next day. In tmp list we store (per asset) the history of close price changes over all L+1 days. These operations are based on a simple list processing. Finally, having a complete information on i-th asset and its price changes after r.shape[1] + 1 days, we build a new array of P with an aid of np.vstack function (see more in Section 3.3.4 of Python for Quants. Volume I). Therefore, P stores the simulated close-price time-series for N assets.

We can display them by adding to our main code:

41 42 43 44 plt.figure(num=1, figsize=(8, 5)) plt.plot(P.T, '+-') plt.xlabel("Days") plt.ylabel("Asset Close Price (\$)") what reveals: where the transposition of P for plotting has been applied to deliver asset-by-asset price-series (try to plot the array without it and see what happens and understand why it is so). The easiest solutions are most romantic, what we have proven above. You don’t impress your girlfriend by buying 44 red roses. 44 lines of Python code will do the same! Trust me on that! ;-) This lesson comes from a fragment of my newest book of Python for Quants. Volume I. The bestselling book in a timeframe of last 3 days after publishing! ## 5 Words on How To Write A Quant Blog An extract from Jacques Joubert’s newest article on How To Write A Great Quant Blog. by Pawel Lachowicz Do not commence working over your blog without the vision. “If you don’t know where you are going, any road will get you there!” You want to avoid that mistake. Spend some time dreaming of the final form of your site. Highly sought after content is important but not as much as your commitment to excel in its delivery. Write from your heart. Listen to your inner voice. Follow your own curiosity. Not the trends. Not what would be profitable. Forget about the noise in the blogosphere. If you want to shine as a diamond, you need to get cut as a diamond. Your blog should be a reflection of your personality. Forget about achieving success quickly. It will take time. It will count in years. Just remember: “Success [in this venture] is something that you attract by becoming an attractive person”. Don’t rush. Make a plan. Build your blog around the uniqueness of your writing. And don’t worry about “instant” followers. Lastly, write less frequently. However, deliver pearls. Not plums. ## Applied Portfolio VaR Decomposition. (3) Incremental VaR and Portfolio Revaluation. Portfolios are like commercial aircrafts. Rely on computers. Dotted around the world. Vulnerable to ever changing weather conditions. Designed to bring benefits. Crafted to take risks and survive when the unexpected happens. As portfolio managers we have an ability to control those risk levels and adjust positions accordingly. The turbulences may occur anytime. A good pilot then knows what to do. In Part 1 we implemented VaR analytical methodology to an exemplary 7-asset portfolio of equities deriving a running portfolio VaR, marginal VaR, and component VaR. In Part 2 we have examined an approximative impact of historical data on the covariance matrix estimation used in the portfolio VaR calculations. Today, we will add a third component: an incremental VaR. Altitude Adjustment For a current portfolio holding$N$assets with a total exposure of$C$(dollars) we get a 95% portfolio VaR estimation simply computing: $$\mbox{VaR}_P = 1.65 \sigma_P C = 1.65\left(\boldsymbol{d}’ \boldsymbol{M}_2 \boldsymbol{d} \right)^{1/2}$$ This can be estimated any time. For example, if we evaluate a portfolio of US stocks on the daily basis, and new data are available to be downloaded after 4pm when markets close, 1-day VaR can be recalculated. If it increases day-over-day and, say, one of component VaR behaves in an abnormal way (rising a warning flag), we may want to do something about that particular position: remove it completely or decrease the exposure. A second scenario could be more optimistic. Say, we have extra cash to invest and we want to increase one (or few) currently held positions in our portfolio. Adding more shares of$i$-th asset in portfolio will increase portfolio VaR. The secret is in not pushing running engines too much and keep the oil temperature within acceptable brackets. Alternatively, as a risk manager working in a global bank, the bank’s portfolio lists 10,453 open positions (across various markets and continents) and you have an incoming proposal of a new trade coming from one of the bank’s client. He wants to invest$\$$2,650,000 in highly illiquid asset. It is physically possible but the bank VaR may increase significantly over night. How much? Well, in this or any similar scenarios, you need to perform the portfolio revaluation employing the incremental VaR procedure. Your aircraft may be brand new but remember that big planes fall from the sky too. In general, by the incremental VaR we consider the difference between a new portfolio VaR (including (a) new trade(s) or a change in (a) current holding(s)) and recent portfolio VaR:$$
\mbox{iVaR}_P = \mbox{VaR}_{P+a} – \mbox{VaR}_P
$$where a symbolises the proposed change. The incremental VaR changes in a non-linear way as the position size can be increased/reduced in any amount. That is the main difference as compared to the marginal VaR. It is possible to expand \mbox{VaR}_{P+a} in series around the original point, ignoring second-order terms if the deviations a are small:$$
\mbox{VaR}_{P+a} = \mbox{VaR}_{P} + (\Delta \mbox{VaR})’a + …
$$and by “small” I mean a dollar change in (a) current holding(s) much much less than C under market exposure. In such case,$$
\mbox{iVaR}_P^{\rm approx} \approx (\Delta \mbox{VaR})’a
$$and the apostrophe denotes the transposition of the vector. This approximation becomes important for really large portfolios (e.g. in banks) where revaluation and a new estimation of bank VaR needs to be derived quickly. A time saver at a reliable precision of landing. In all other cases, a new vector of extra exposures can be added to present position sizes and reevaluated. Let’s see how it works in practice! Case Study A: Increasing Position of One Asset Let’s recall the same initial settings as we have done in Part 1. We deal with 7-asset portfolio of US equities as of 12th of January, 2015. There is nearly 3 million dollars invested in stock market (I wish it were mine! ;) and based on 3-year historical data of all underlying stocks, the estimated 1-day 95% portfolio VaR is \$$33,027.94 as forecasted for the next trading day. Id est, there is 5% of chances that we can lose 1.1% of the portfolio value before English men in New York will rise their cups of tea:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 % Applied Portfolio VaR Decomposition. (3) Incremental VaR and Portfolio % Revaluation. % % (c) 2015 QuantAtRisk.com, by Pawel Lachowicz   clear all; close all; clc; format bank   %{ % Read the list of the portfolio components fileID = fopen('portfolio.lst'); tmp = textscan(fileID, '%s'); fclose(fileID); pc=tmp{1}; % a list as a cell array   % fetch stock data for last 3 years since: t0=735976; % 12-Jan-2015 date2=datestr(t0,'yyyy-mm-dd'); % from date1=datestr(t0-3*365,'yyyy-mm-dd'); % to   % create an empty array for storing stock data stockd={}; % scan the tickers and fetch the data from Quandl.com for i=1:length(pc) quandlc=['WIKI/',pc{i}]; fprintf('%4.0f %s\n',i,quandlc); % fetch the data of the stock from Quandl % using recommanded Quandl's command and % saving them directly into Matlab's FTS object (fts) fts=0; [fts,headers]=Quandl.get(quandlc,'type','fints', ... 'authcode','YourQuandlCode',... 'start_date',date1,'end_date',date2); stockd{i}=fts; % entire FTS object in an array's cell end save data %}   load data   % limit data to 3 years of business days, select adjusted % close price time-series and calculate return-series rs=[]; for i=1:length(pc) cp=fts2mat(stockd{i}.Adj_Close,0); rv=cp(2:end,1)./cp(1:end-1,1)-1; rs=[rs rv(end-(3*5*4*12):end)]; end rs(isnan(rs))=0.0;   % covariance matrix M2=cov(rs)   % an initial dollar expose per position d=[ 55621.00; ... 101017.00; ... 23409.00; ... 1320814.00; ... 131145.00; ... 321124.00; ... 1046867.00]   % invested capital of C C=sum(d)   % the portfolio volatility vol_P=sqrt(d'*M2*d);   % diversified portfolio VaR VaR_P=1.65*vol_P   % volatility of assets (based on daily returns) v=sqrt(diag(cov(rs)));   % individual and undiversified portfolio VaR VaRi=1.65*abs(v.*d); uVaR_P=sum(VaRi);   % the portfolio betas beta=C*(M2*d)/vol_P^2;   % the portfolio marginal VaR DVaR=1.65*(M2*d)/vol_P;   % the component VaR cVaR=100*DVaR.*d/VaR_P; % [percent] cVaRd=DVaR.*d; % dollars

Now, we want to add $\$$10,000 to our exposure in Walt Disney Co. stock. As of Jan/12 2015 DIS closes at \$$94.48 therefore we consider to buy$\$$10,000/\$$94.48=105.8 shares on the next day. We are flexible so we buy at the market price of $\$$95.23 on Jan/13, i.e. extra 105 shares in DIS. Therefore we added 105\times \95.23 or \$$9,999.15 to our portfolio: 89 90 91 92 93 94 95 96 % proposed changes to current exposure da=[0; ... 9999.15; ... 0; ... 0; ... 0; ... 0; ... 0 ]; Of course we can assume a new portfolio VaR estimation by adding extra 105 shares of DIS. In real weather conditions, this number can be lower. Keep that in mind. The remaining part seems to be easy to code: 100 101 102 103 104 105 106 107 108 109 % new portfolio VaR VaR_Pa=1.65*sqrt((d+da)'*M2*(d+da)) % incremental VaR iVaR=VaR_Pa-VaR_P; % incremental VaR (approximation; see text) iVaR_approx=DVaR'*da; % new component VaR ncVaR=100*DVaR.*(d+da)/VaR_P; % [percent] ncVaRd=DVaR.*(d+da); % dollars followed by a new risk report on the impact of proposed changes: 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 fprintf('Current exposure:\t\t\t$%1s\n',bankformat(C)); fprintf('Current Portfolio VaR (diversified):\t $%1s\n',bankformat(VaR_P)); fprintf('Proposed changes in holdings:\n'); for i=1:length(d) fprintf(' %s\t$%-s\n',pc{i},bankformat(da(i)) ); end fprintf('New Portfolio VaR (diversified):\t $%1s\n',bankformat(VaR_Pa)); fprintf('Incremental VaR:\t\t\t$%1s\n',bankformat(iVaR)); fprintf('Incremental VaR (approximation):\t $%1s\n',bankformat(iVaR_approx)); fprintf('Change in exposure:\t\t\t %-1.4f%%\n',sum(da)/C); fprintf('Component VaR:\n'); fprintf('\t Current\t\t New\t\t\tChange in CVaR\n'); for i=1:length(d) fprintf(' %s\t%6.2f%%$%-10s\t%6.2f%% $%-10s\t$%-10s\n', ... pc{i},cVaR(i), ... bankformat(cVaRd(i)),ncVaR(i), ... bankformat(ncVaRd(i)),bankformat(ncVaRd(i)-cVaRd(i)) ); end

In this case study, the report says:

Current exposure: $2,999,997.00 Current Portfolio VaR (diversified):$33,027.94 Proposed changes in holdings: AAPL $0.00 DIS$9,999.15 IBM $0.00 JNJ$0.00 KO $0.00 NKE$0.00 TXN $0.00 New Portfolio VaR (diversified):$33,084.26 Incremental VaR: $56.31 Incremental VaR (approximation):$55.84 Change in exposure: 0.0033% Component VaR: Current New Change in CVaR AAPL 0.61% $202.99 0.61%$202.99 $0.00 DIS 1.71%$564.13 1.88% $619.97$55.84 IBM 0.26% $84.43 0.26%$84.43 $0.00 JNJ 30.99%$10,235.65 30.99% $10,235.65$0.00 KO 1.19% $392.02 1.19%$392.02 $0.00 NKE 9.10%$3,006.55 9.10% $3,006.55$0.00 TXN 56.14% $18,542.15 56.14%$18,542.15 $0.00 That’s the great example to show how the approximation of the incremental VaR works in practice.$\$$10,000 added to DIS is “small” as compared to C of nearly 3 million dollars in the game. By adding 105 shares of DIS we didn’t move 1-day portfolio VaR significantly which is good. Let’s play again. Case Study B: Increase and Decrease Exposure From the initial risk report we learnt that TXN contributed most to the overall portfolio VaR. Let’s say we have extra \$$126,000 to invest and we think that AAPL would be the next shining star. In the same time, we want to reduce our market exposure to TXN by selling stocks of the total worth of $\$$500,000. This operation would be denoted as: 89 90 91 92 93 94 95 96 % proposed changes to current exposure da=[126000; ... 0; ... 0; ... 0; ... 0; ... 0; ... -500000 ]; and Current exposure: 2,999,997.00 Current Portfolio VaR (diversified): 33,027.94 Proposed changes in holdings: AAPL 126,000.00 DIS 0.00 IBM 0.00 JNJ 0.00 KO 0.00 NKE 0.00 TXN -500,000.00 New Portfolio VaR (diversified): 25,869.04 Incremental VaR: -7,158.90 Incremental VaR (approximation): -8,396.16 Change in exposure: -0.1247% Component VaR: Current New Change in CVaR AAPL 0.61% 202.99 2.01% 662.85 459.85 DIS 1.71% 564.13 1.71% 564.13 0.00 IBM 0.26% 84.43 0.26% 84.43 0.00 JNJ 30.99% 10,235.65 30.99% 10,235.65 0.00 KO 1.19% 392.02 1.19% 392.02 0.00 NKE 9.10% 3,006.55 9.10% 3,006.55 0.00 TXN 56.14% 18,542.15 29.33% 9,686.13 -8,856.02 The incremental VaR reacts to a massive sell out of TXN reducing the overall portfolio VaR significantly. Adding \$$126k to APPL (share number precision omitted for the sake of simplicity) does not affect the portfolio risk levels in any dramatic way. Case Study C: Adding Two New Assets Your are a child of a new era, the era of credit cards. More and more people use them worldwide. You have this brilliant idea to add to your portfolio extra$\$$100,000 invested in (split equally between) Mastercard Inc. and Visa Inc. Considering such an operation requires a new portfolio revaluation. Let’s do it step by step. First, let’s fetch historical data for both MA and V stocks, the same way as we did it at the very beginning. We save the tickers names in a separate file named proposal.lst: MA V We supplement out Matlab code with extra few lines: 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 % Read the list of the portfolio components fileID = fopen('proposal.lst'); tmp = textscan(fileID, '%s'); fclose(fileID); pcE=tmp{1}; % a list as a cell array % fetch stock data for last 3 years since: t0=735976; % 12-Jan-2015 date2=datestr(t0,'yyyy-mm-dd'); % from date1=datestr(t0-3*365,'yyyy-mm-dd'); % to % create an empty array for storing stock data stockd={}; % scan the tickers and fetch the data from Quandl.com for i=1:length(pcE) quandlc=['WIKI/',pcE{i}]; fprintf('%4.0f %s\n',i,quandlc); % fetch the data of the stock from Quandl % using recommanded Quandl's command and % saving them directly into Matlab's FTS object (fts) fts=0; [fts,headers]=Quandl.get(quandlc,'type','fints', ... 'authcode','YourQuandlCode',... 'start_date',date1,'end_date',date2); stockdp{i}=fts; % entire FTS object in an array's cell end save datap %} load datap % limit data to 3 years of business days, select adjusted % close price time-series and calculate return-series rsp=[]; for i=1:length(pcE) cp=fts2mat(stockdp{i}.Adj_Close,0); rv=cp(2:end,1)./cp(1:end-1,1)-1; rsp=[rsp rv(end-(3*5*4*12):end)]; end rsp(isnan(rsp))=0.0; As quickly as new data arrive, we build and extend return-series matrix, now not 721×7 but 721×9 in its dimensions: 171 rsE=[rs rsp]; What follows is pretty intuitive in coding: 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 % covariance matrix M2E=cov(rsE); % proposed changes to current exposure daE=[0; ... 0; ... 0; ... 0; ... 0; ... 0; ... 0; ... 50000; ... 50000 ]; % new portfolio VaR d=[d; 0; 0] % we have to extend it to allow d+daE VaR_PaE=1.65*sqrt((d+daE)'*M2E*(d+daE)) % a running list of tickers, too pcE=[pc; pcE] % incremental VaR iVaRE=VaR_PaE-VaR_P; fprintf('Current exposure:\t\t\t %1s\n',bankformat(C)); fprintf('Current Portfolio VaR (diversified):\t %1s\n',bankformat(VaR_P)); fprintf('Proposed changes in holdings:\n'); for i=1:length(daE) fprintf(' %s\t %-s\n',pcE{i},bankformat(daE(i)) ); end fprintf('New Portfolio VaR (diversified):\t %1s\n',bankformat(VaR_PaE)); fprintf('Incremental VaR:\t\t\t %1s\n',bankformat(iVaRE)); generating new risk estimations: Current exposure: 2,999,997.00 Current Portfolio VaR (diversified): 33,027.94 Proposed changes in holdings: AAPL 0.00 DIS 0.00 IBM 0.00 JNJ 0.00 KO 0.00 NKE 0.00 TXN 0.00 MA 50,000.00 V 50,000.00 New Portfolio VaR (diversified): 33,715.09 Incremental VaR: 687.15 From this example we can see that extending our portfolio with two assets would lead us to the increase of 1-day portfolio VaR by approximately \$$680. A digestable amount.

Stay tuned as in Part 4 we go extreme with Extreme VaR.

Endnote

When you pilot an aircraft from Amsterdam to Tokyo, the path is usually over Norway, North Pole, and eastern Russia. If one of the engines goes down, you switch it off and ground the plane at the nearest airport. The risk in a commercial aviation is too high to continue the flight.

As a portfolio manager you deal with similar situations but you do not necessarily close your portfolio at once. Instead, you reduce the risk to troublesome assets. You can still fly in the markets but with reduced thrust and hopes for big gains. But don’t worry! Every plane lands with its nose up towards the sky. You can be losing but stay positive till the very end. You never know when cross-winds will change your situation. You may be in the game again before the bell will close the session.

## Quants, NumPy, and LOTTO

Lesson 9>>

Since I’m working over the Volume I of Python for Quants ebook and I am going through NumPy abilities, they leave me speechless despite the rain. Somehow. Every time. There is so much flexibility in expressing your thoughts, ideas, and freedom of coding of logical and mathematical concepts as a quant. With its grace of syntax, with its purity of simplicity.

When I was 16 I used to play LOTTO: the game where for your 3 favourite combinations of 6 numbers out of 49 you pay $\$1$. It was obvious that hitting “six” was an equivalent of the probability of: $$\frac{1}{C^6_{49}} = \left( \frac{49!}{6!(49-6)!} \right)^{-1} = \frac{1}{13983816} \ \ .$$ I had my favourite 6 numbers. The magic was that I believed that I could win the entire prize. Somehow, the game was worth trying. You never know when your lucky day comes! Monte-Carlo Simulation for LOTTO It is out of any doubts that our mind can create certain circumstances in real world (that we do not immediately understand) that allow us to bend a reality to our knees. How? Well, if you attend any mind control seminar you learn on the power of your mind and how to trick mathematical, logical reasoning. That applies to LOTTO game as well. There is nearly 14M combinations and we aim to win that game. A low probability. A high reward if you are lucky. I used to play with my favourite combination: 7, 9, 11, 21, 35, 45 for nearly 3 months. I was visualising a situation when one day exactly those numbers could appear in the LOTTO drawing machine next Wednesday or Saturday. It took me 3 months to hit 5 out of 6 numbers! The financial reward was nice, trust me! But because I was of small faith, I did not hit the bull between its eyes. Now it’s your turn! Python language is superb. It offers you an opportunity to simulate your luck just in few minutes. Say, you are like me, you randomly pick up 6 out of 49 numbers and you play, two times a week, and… wait, for your early retirement. With NumPy library in Python, the estimation of the waiting time is straightforward. Analyse the following code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # Quants, NumPy, and LOTTO # (c) 2014 QuantAtRisk.com, by Pawel Lachowicz import numpy as np def lotto_numbers(): ok = False while not ok: x = np.random.choice(49,6,replace=False) x.sort() tmp = np.where(x == 0) (m, )= tmp[0].shape if(m == 0): ok = True return x fav = lotto_numbers() # choose your favourite 6 numbers print(fav) match = False; i = 0 while not match: tmp = lotto_numbers() cmp = (tmp == fav) i += 1 if cmp.all(): match = True print("Iterations: %g\nProbability = %.2e" % (i,1./i) ) print(tmp) It’s a nice combination of most useful NumPy functions in action: .random.choice(), .where(), .shape(), .sort(), and .all(). In line #9 we create a 6-element vector of random integers drawn from a set$[0,49]$and in line #10 we force them all to be sorted. Because$0$is not the part of the game, we need to check for it presence and draw again if detected. Line #11 returns a temporary array storing indexes where$x$has a zero-element. Since we use .random.choice function with a parameter replace=False, we can be sure that there is only one zero (if drawn). Therefore, the result of line #12 should be$m$equal 0 or 1. In the main program (lines #17-29) we repeat the loop until we find the match between a (random) selection of our favourite 6 numbers and a new LOTTO draw. In line #23 an array cmp holds 6 boolean elements as the result of comparison. Therefore, making the arrays sorted is essential part of the code. Finally, in line #25 a NumPy function of .all() returns True or False if all elements are True (or not), respectively. Nice and easy. An exemplary outcome you can get is: [ 1 2 4 11 13 18] Iterations: 37763 Probability = 2.65e-05 [ 1 2 4 11 13 18] To provide you with a feeling how long you need to wait to hit your favourite “six” let’s calculate: >>> 37763/(2*52) 363 what returns a number of years. Trust me, if you really really really want to win playing only with your lucky numbers, you can. It is a matter of faith, luck, and… time. As a homework try to modify the code to run it$10^4$times and plot the histogram of probabilities. Is it Normal or Uniform? Now I guess I should say Good luck! ## Setting up Python for Quantitative Analysis in OS X 10.10 Yosemite Welcome! The most recent update of Apple’s OS X 10.10 Yosemite makes its mark with a bit of splendour. But, under the hood, not too much has changed. The same Unix shell, the same way to perform our daily programming at the same level. Setting up Python in Yosemite appears today to be easier than ever (compare my post on performing the same task in OS X 10.9 Mavericks a year ago). Download and Installation The evolution of Continuum Analytics with its unique mission to deliver Python to everyone started to pay off. They created Anaconda, a completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing. It’s free including its application for commercial purposes. You can get a working version for Yosemite (as well for Linux and Microsoft Windows). Just go to: fill in your email details, and choose next with grand version (2.7 vs 3.4) you prefer, and download the .dmg file to begin the installation on your Mac in a graphical mode. It installs into a single directory and does not affect other Python installations on your system. No admin/root privileges are required too. You can switch among Python interpreters in versions of 2.6, 2.7, 3.3, and 3.4. After installation you will get an instant access to 195+ of the most popular Python packages for science, math, engineering, data analysis including NumPy, SciPy, Pandas, IPython, Matplotlib, Numba, Blaze, and others. Don’t rush with payment for an additional Anaconda Accelerate package making use of GPUs. As I will address this thread in my next post, Python using CUDA within Numbapro module offered by Continuum Analytics disappoints. ## Visualisation of N-Asset Portfolio in Matlab Many of you who wished to use the Matlab’s Financial Toolbox for portfolio visualization most probably realised that something was wrong. Not every time the annualised risk and return for different time-series matched with what was expected. Below you can find an improved piece of code which does the job correctly. The input sampling of your financial time-series now can be specified directly as daily, weekly, monthly, or quarterly and the risk-return diagram will be displayed with annualised results. Otherwise, you have an option, instead of annualised value, specify another time-horizon. All smooth and easy. % Specialised plotting function for visualisation of portfolios % based on the Financial Toolbox's portfolio objects % % Modified by Pawel Lachowicz, 2014/03/01 % % Remarks: Essential corrections introduced in the calculation and plotting % of the annualised risk and returns coded incorrectly by the team at % The MathWorks, Inc. function plotportfolio(varargin) plottitle = varargin{1}; % plot title, e.g. 'Efficient Frontier' plotfreq = varargin{2}; % frequency of raw data (dt; or 'daily', etc.) plotfreq2= varargin{3}; % e.g. ’annualised’ plotfreq3= varargin{4}; plotlegend = []; switch plotfreq2 case 'annualised' switch plotfreq case 'daily' periods=252; case 'weekly' periods=52; case 'monthly' periods=12; case 'quarterly' periods=4; end otherwise periods=plotfreq3; end for i = 5:nargin plotinfo = varargin{i}; plottype = plotinfo{1}; plotrsk = plotinfo{2}; plotret = plotinfo{3}; if numel(plotinfo) > 3 plotlabel = plotinfo{4}; else plotlabel = []; end if numel(plotinfo) > 4 plotstyle = plotinfo{5}; if isempty(plotstyle) plotstyle = 'b'; end else if strcmpi(plottype,'line') plotstyle = 'b'; else plotstyle = 'g'; end end if numel(plotinfo) > 5 plotline = plotinfo{6}; if isempty(plotline) plotline = 1; end else if strcmpi(plottype,'line') plotline = 2; else plotline = []; end end % line plot if strcmpi(plottype,'line') hold on; for k = 1:size(plotrsk,2) plot(plotrsk(:,k)*sqrt(periods),((plotret(:,k)+1).^periods-1), ... plotstyle, 'LineWidth', plotline); if i == 2 && k == 1 hold on end if ~isempty(plotlabel) && ~isempty(plotlabel{k}) plotlegend = [ plotlegend, {plotlabel{k}} ]; end end % scatter plot else if any(plotstyle == '.') scatter(plotrsk*sqrt(periods),((plotret(:,k)+1).^periods-1),'or','Filled'); else scatter(plotrsk*sqrt(periods),((plotret(:,k)+1).^periods-1),'og','Filled'); end if i==2 hold on end if ~isempty(plotlabel) for k = 1:numel(plotrsk) if ~isempty(plotlabel{k}) text(plotrsk(k)*sqrt(periods)+0.005, ... ((plotret(k)+1).^periods-1), plotlabel{k},'FontSize', 9); end end end end end if ~isempty(plotlegend) legend(plotlegend,'Location','SouthEast'); end if(plotfreq2=='annualised') xlabel('Annualised Volatility'); ylabel('Annualised Expected Returns'); else xlabel('Volatility'); ylabel('Returns'); end grid on end Now, you can call it with various parameters of your interest, matching the data directly, for instance: plotportfolio('Portfolio',dt,'annualised',[], ... {'line', prsk, pret}, ... {'scatter',CashRsk,CashRet, {'Cash'}}, ... {'scatter', sqrt(diag(p.AssetCovar)), p.AssetMean, p.AssetList,'.r'}); For some awesome practical application of this function in the process of construction and analysis of$N$-asset portfolio grab Applied Portfolio Optimization with Risk Management using Matlab ebook today. ## Merry… – from QuantAtRisk! Stay tuned as still this year we are coming back to the problem of Portfolio Optimization and new lesson on Accelerated Python for Quants. Be happy, relax, take your time, but strive for excellence, and never be satisfied with who you are and what you know! There is always somewhere out there someone who works harder than you! Lots of Positive Vibes! -Pawel ## QuantAtRisk on YouTube! Since Dec/1 you can follow QuantAtRisk YouTube Channel! My goal is to deliver you a quality information and visualisation of solutions for complex financial problems. We kick off this session with the very first video on Accelerated Matlab for Quants, a tutorial for complete beginners in Matlab programming. Enjoy! ## Join us at HPC Technologies in Finances in Singapore (15-16 July 2013) To All of You, who dare to know more, to understand more, to become amazing at work! Don’t miss the opportunity of your life and sign up for this spectacular event in a picturesque city of Singapore! Enhance your skills, broaden your financial experience, make your boss proud of you! Visit us and enjoy this state-of-the art workshop (theory+practice) just one degree north off the equator. See you in Singapore very soon! Marek T. Michalewicz, PhD, Director of A*STAR Pawel Lachowicz, PhD, QaR Click on the banner for more information: ## VaR and Expected Shortfall vs. Black Swan It is one of the most fundamental approaches in measuring the risk, but truly worth revising its calculation. Value-at-Risk (VaR). A magical number quoted at the end of the day by the banks’ financial risk managers, portfolios’ risk managers, and risk managers simply concerned about the expected risk threshold not to be, hopefully, exceeded on the next day. What is so magical about it? According to definition, given a specific time horizon and taking into account last$N$days of our trading activity, we can always calculate a number which would provide us with a simple answer to the question: if something really unexpected happened on the next day, what would be the loss margin we could for sure suffer? Well, welcome to the world of VaR! More formally, given a daily (or, weekly, monthly, etc.) distribution of returns (of, for example, a single stock, a portfolio of assets, etc.), we are interested in finding the value of$VaR_\alpha$of, say,$\alpha=0.05$(five percent) which would say to as that there is 5% of chances that the value of$VaR_{0.05}$would be exceeded in trading on the next day. This value is located in the left tail of the distribution and, by convention, it is given as positive number. Therefore, we define$VaR_\alpha$as: $$P[X\le VaR_\alpha] = 1 – \alpha$$ where$X$is a random variable and can, as in our assumed case, represent a 1-day return on a traded asset or a portfolio of assets. Beautiful! We have the risk threshold provided as a number. But in practice, how should we calculate$VaR_\alpha$? Here come into spotlight some of the shortcomings of VaR in general. VaR depends on the time-frame of the daily returns we examine: therefore$VaR_\alpha$as a number will be different if you include last 252 trading days in estimation comparing to last 504 trading days (2 years). Intuitively, the more data (more days, years, etc.) we have the better estimation of$VaR_\alpha$, right? Yes and no. Keep in mind that the dynamic of the trading market changes continuously. In early 80s there was no high-frequency trading (HFT). It dominated trading traffic after the year of 2000 and now HFT plays a key role (an influential factor) in making impact on prices. So, it would be a bit unfair to compare$VaR_\alpha$estimated in 80s with what is going on right now.$VaR_\alpha$assumes that the distribution of returns is normal. Not most beautiful assumption but the most straightforward to compute$VaR_\alpha$. Should we use$VaR_\alpha$? Yes, but with caution. Okay then, having$VaR_\alpha$calculated we know how far the loss could reach at the$1-\alpha$confidence level. The next super important question in risk management every risk manager should ask or at least be interested in reporting is: if$VaR_\alpha$event occurs on the next day, what is the expected loss we might expect to suffer (say, in dollars)?$VaR_\alpha$is the threshold. Now we are interested in the expected value of loss given the exceedance. It is the definition of expected shortfall and is based on the concept of conditional probability as follows: $$E[X\ |\ X>VaR_\alpha] = ES \ .$$ In general, if our daily distribution of returns can be described by a function$f(x)$which would represent a power density function (pdf), then: $$ES = \frac{1}{\alpha} \int_{-\infty}^{VaR_\alpha} xf(x)dx \ .$$ Given any data set of returns,$R_t\ (t=1,…,N)$, calculated from our price history, $$R_t = \frac{P_{t+1}}{P_t} -1$$ both numbers,$VaR_\alpha$and$ES$can be, in fact, calculated in two ways. The first method is based on the empirical distribution, i.e. using data as given: $$VaR_\alpha = h_i^{VaR} \ \ \mbox{for}\ \ \sum_{i=1}^{M-1} H_i(h_{i+1}-h_i) \le \alpha$$ where$H$represents the normalized histogram of$R_t$(i.e., its integral is equal 1) and$M$is the number of histograms bins. Similarly for$ES$, we get: $$ES = \sum_{i=1}^{h_i^{VaR}} h_iH_i(h_{i+1}-h_i) \ .$$ The second method would be based on integrations given the best fit to the histogram of$R_t$using$f(x)$being a normal distribution. As we will see in the practical example below, both approaches returns different values, and an extra caution should be undertaken while reporting the final risk measures. Case Study: IBM Daily Returns in 1987 The year of 1987 came into history as the most splendid time in the history of stock trading. Why? Simply because of so-called Black Monday on Oct 19, where markets denoted over 20% losses in a single trading day. Let’s analyse the case of daily returns and their distribution for IBM stock, as analysed by a risk manager after the close of the market on Thu, Dec 31, 1978. The risk manager is interested in the estimation of 1-day 95% VaR threshold, i.e.$VaR_0.05$, and the expected shortfall if on the next trading day (Jan 4, 1988) the exceedance event would occur. Therefore, let’s assume that our portfolio is composed solely of the IBM stock and we have the allocation of the capital of$C$dollars in it as for Dec 31, 1987. Using a record of historical trading, IBM.dat, of the form: IBM DATES OPEN HIGH LOW CLOSE VOLUME 03-Jan-1984 0.000000 0.000000 0.000000 30.437500 0.000000 04-Jan-1984 0.000000 0.000000 0.000000 30.968750 0.000000 05-Jan-1984 0.000000 0.000000 0.000000 31.062500 0.000000 06-Jan-1984 0.000000 0.000000 0.000000 30.875000 0.000000 ... 04-Mar-2011 0.000000 0.000000 0.000000 161.830000 0.000000 07-Mar-2011 0.000000 0.000000 0.000000 159.930000 0.000000 containing the close price of IBM stock, in Matlab, we extract the data for the year of 1987, then we construct a vector of daily returns$R_t$on the stock, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 % Reading in the data fn=['IBM.dat']; ftsD=ascii2fts(fn,1,2); range=[datenum('02-Jan-1987'),datenum('31-Dec-1987')]; drr=[datestr(range(1)),'::',datestr(range(2))]; fts=ftsD(drr); data=fts2mat(fts,1); % R_t vector rdata=data(2:end,5)./data(1:end-1,5)-1; % Plotting figure(1); subplot(2,1,1); plot(data(:,1)-datenum('2-Jan-1987'),data(:,5)); xlim([0 365]); ylabel('IBM Close Price ($)'); subplot(2,1,2); plot(data(2:end,1)-datenum('2-Jan-1987'),rdata); xlim([0 365]); ylabel('R_t') xlabel('t (days)')

and present the data graphically:

Having our data, let’s find $VaR_{0.05}$ and corresponding $ES$,

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 % Construct the normalized histogram [H,h]=hist(rdata,100); hist(rdata,100); sum=0; for i=1:length(H)-1 sum=sum+(H(i)*(h(i+1)-h(i))); end figure(2) H=H/sum; bar(h,H,'FaceColor',[0.7 0.7 0.7],'EdgeColor',[0.7 0.7 0.7])   % Calculate VaR 95 based on empirical distribution sum=0; i=1; while (i<length(H)) sum=sum+(H(i)*(h(i+1)-h(i))); if(sum>=0.05) break; end i=i+1; end VaR=h(i) hold on; plot(h(i),0,'ro'); % mark VaR in the plot   % Fit the normal distribution to R_t data [muhat,sigmahat] = normfit([rdata]'); Y = normpdf(h,muhat(1),sigmahat(1)); hold on; plot(h,Y,'b-'); % and plot the fit using blue line   % Find for the fitted N(muhat,sigmahat) VaR 95% sum=0; i=1; while (i<length(h)) sum=sum+(Y(i)*(h(i+1)-h(i))); if(sum>=0.05) break; end i=i+1; end VaR_fit=h(i) hold on; plot(h(i),0,'ro','markerfacecolor','r'); % mark VaR_fit ivar=i;   % Calculate the Expected Shortfall % based on fitted normal distribution sum=0; i=1; while (i<=ivar) sum=sum+(h(i)*Y(i)*(h(i+1)-h(i))); i=i+1; end ES_fit=sum/0.05 hold on; plot(ES_fit,0,'ko','markerfacecolor','k');   % Add numbers to the plot text(-0.23,25,['1-Day VaR and Expected Shortfall']); text(-0.23,23.5,['as for 31/12/1987']); text(-0.23,20,['VaR = ',num2str(VaR*-1,3),' (',num2str(VaR*-100,2),'%)']); text(-0.23,18,['ES = ',num2str(ES_fit*-1,3),' (',num2str(ES_fit*-100,2),'%)']); text(-0.23,13,['\mu = ',num2str(muhat,2)]); text(-0.23,11,['\sigma = ',num2str(sigmahat,2)]); xlabel('R_t');

marking all numbers in resulting common plot:

In the plot, filled red and black markers correspond to $VaR_{0.05}$ and $ES$, respectively, and are calculated using the fitted normal distribution. Both values have been displayed in the plot as well. In addition to that, the plot contains a red open circle marker denoting the value of $VaR_{0.05}$ obtained from the empirical histogram discrete integration. Please note how the choice of the method influences the value of VaR.

The derived numbers tell the risk manager that in 95% of cases one should feel confident that the value of $VaR_{0.05}$ will not be exceed in trading of IBM on Jan 4, 1988 (next trading day). $C\times VaR_{0.05} = 0.028C$ would provide us with the loss amount (in dollars) if such undesired event had occurred. And if it was the case indeed, the expected loss (in dollars) we potentially would suffer, would be $C\times ES = 0.058C$.

Now it easy to understand why the Black Swan event of Oct 19, 1987 falls into the category of super least likely moves in the markets. It places itself in the far left tail of both empirical and fitted distributions denoting an impressive loss of 23.52% for IBM on that day. If we assume that the daily price changes are normally distributed then probability of a drop of at least 22 standard deviations is
$$P(Z\le -22) = \Phi(-22) = 1.4 \times 10^{-107} \ .$$ This is astronomically rare event!

They say that a lightening does not strike two times, but if a 1-day drop in the market of the same magnitude had occurred again on Jan 4, 1988, our long position in IBM would cause us to suffer not the loss of $0.058C$ dollars but approximately $0.23C$. Ahhhuc!

## If there is No Risk, there is No Reward

Today I allowed to post something different. Something what could help you to understand that true risks in life are not only those related to the financial markets or portfolio management. I learned a lot from this video clip and now, every time I go back, over and over again, I find more encouragement and power to simply follow what is worth taking the risk. Enjoy your weekend break!