Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Rebinning Tick-Data for FX Algo Traders

If you work or intend to work with FX data in order to build and backtest your own FX models, the Historical Tick-Data of Pepperstone.com is probably the best place to kick off your algorithmic experience. As for now, they offer tick-data sets of 15 most frequently traded currency pairs since May 2009. Some of the unzip’ed files (one month data) reach over 400 MB in size, i.e. storing 8.5+ millions of lines with a tick resolution for both bid and ask “prices”. A good thing is you can download them all free of charge and their quality is regarded as very high. A bad thing is there is 3 month delay in data accessibility.

Dealing with a rebinning process of tick-data up, that’s a different story and the subject of this post. We will see how efficiently you can turn Pepperstone’s Tick-Data set(s) into 5-min time-series as an example. We will make use of scripting in bash (Linux/OS X) supplemented with data processing in Python.

Data Structure

You can download Pepperstone’s historical tick-data from here, month by month, pair by pair. Their inner structure follows the same pattern, namely:

$head AUDUSD-2014-09.csv AUD/USD,20140901 00:00:01.323,0.93289,0.93297 AUD/USD,20140901 00:00:02.138,0.9329,0.93297 AUD/USD,20140901 00:00:02.156,0.9329,0.93298 AUD/USD,20140901 00:00:02.264,0.9329,0.93297 AUD/USD,20140901 00:00:02.265,0.9329,0.93293 AUD/USD,20140901 00:00:02.265,0.93289,0.93293 AUD/USD,20140901 00:00:02.268,0.93289,0.93295 AUD/USD,20140901 00:00:02.277,0.93289,0.93296 AUD/USD,20140901 00:00:02.278,0.9329,0.93296 AUD/USD,20140901 00:00:02.297,0.93288,0.93296 The columns, from left to right, represent respectively: a pair name, the date and tick-time, the bid price, and the ask price. Pre-Processing Here, for each .csv file, we aim to split the date into year, month, and day separately, and remove commas and colons to get raw data ready to be read in as a matrix (array) using any other programming language (e.g. Matlab or Python). The matrix is mathematically intuitive data structure therefore making direct reference to any specific column of it makes any backtesting engine running with its full thrust. Let’s play with AUDUSD-2014-09.csv data file. Working in the same directory where the file is located we begin with writing a bash script (pp.scr) that contains: 1 2 3 4 5 6 7 8 9 10 11 # pp.scr # Rebinning Pepperstone.com Tick-Data for FX Algo Traders # (c) 2014 QuantAtRisk, by Pawel Lachowicz clear echo "..making a sorted list of .csv files" for i in$1-*.csv; do echo ${i##$1-} $i${i##.csv}; done | sort -n | awk '{print $2}' >$1.lst   python pp.py head AUDUSD.pp

that you run in Terminal:

$chmod +x pp.scr$ ./pp.scr AUDUSD

where the first command makes sure the script becomes executable (you need to perform this task only once). Lines #7-8 of our script, in fact, look for all .csv data files in the local directory starting with AUDUSD- prefix and create their list in AUDUSD.lst file. Since we work with AUDUSD-2014-09.csv file only, the AUDUSD.lst file will contain:

$cat AUDUSD.lst AUDUSD-2014-09.csv as expected. Next, we utilise the power and flexibility of Python in the following way: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # pp.py import csv fnlst="AUDUSD.lst" fnout="AUDUSD.pp" for lstline in open(fnlst,'r').readlines(): fncur=lstline[:-1] #print(fncur) with open(fnout,'w') as f: writer=csv.writer(f,delimiter=" ") i=1 # counts a number of lines with tick-data for line in open(fncur,'r').readlines(): if(i<=5200): # replace with (i>0) to process an entire file #print(line) year=line[8:12] month=line[12:14] day=line[14:16] hh=line[17:19] mm=line[20:22] ss=line[23:29] bidask=line[30:] writer.writerow([year,month,day,hh,mm,ss,bidask]) i+=1 It is a pretty efficient way to open really a big file and process its information line by line. Just for further purpose of display, in the code we told computer to process only first 5,200 of lines. The output of lines #10-11 of pp.scr is the following: 2014 09 01 00 00 01.323 "0.93289,0.93297 " 2014 09 01 00 00 02.138 "0.9329,0.93297 " 2014 09 01 00 00 02.156 "0.9329,0.93298 " 2014 09 01 00 00 02.264 "0.9329,0.93297 " 2014 09 01 00 00 02.265 "0.9329,0.93293 " since we allowed Python to save bid and ask information as one string (due to a variable number of decimal digits). In order to clean this mess we continue: 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # pp.scr (continued) echo "..removing token: comma" sed 's/,/ /g' AUDUSD.pp >$1.tmp rm AUDUSD.pp   echo "..removing token: double quotes" sed 's/"/ /g' $1.tmp >$1.tmp2 rm $1.tmp echo "..removing empty lines" sed -i '/^[[:space:]]*$/d' $1.tmp2 mv$1.tmp2 AUDUSD.pp   echo "head..." head AUDUSD.pp echo "tail..." tail AUDUSD.pp

what brings us to pre-processed data:

..removing token: comma ..removing token: double quotes ..removing empty lines head... 2014 09 01 00 00 01.323 0.93289 0.93297 2014 09 01 00 00 02.138 0.9329 0.93297 2014 09 01 00 00 02.156 0.9329 0.93298 2014 09 01 00 00 02.264 0.9329 0.93297 2014 09 01 00 00 02.265 0.9329 0.93293 2014 09 01 00 00 02.265 0.93289 0.93293 2014 09 01 00 00 02.268 0.93289 0.93295 2014 09 01 00 00 02.277 0.93289 0.93296 2014 09 01 00 00 02.278 0.9329 0.93296 2014 09 01 00 00 02.297 0.93288 0.93296 tail... 2014 09 02 00 54 39.324 0.93317 0.93321 2014 09 02 00 54 39.533 0.93319 0.93321 2014 09 02 00 54 39.543 0.93318 0.93321 2014 09 02 00 54 39.559 0.93321 0.93321 2014 09 02 00 54 39.784 0.9332 0.93321 2014 09 02 00 54 39.798 0.93319 0.93321 2014 09 02 00 54 39.885 0.93319 0.93325 2014 09 02 00 54 39.886 0.93319 0.93321 2014 09 02 00 54 40.802 0.9332 0.93321 2014 09 02 00 54 48.829 0.93319 0.93321

Personally, I love that part as you can learn how to do simple but necessary text file operations by typing single lines of Unix/Linux commands. Good luck for those who try to repeat the same in Microsoft Windows not spending more than 30 sec for doing it.

Rebinning: 5-min Data

The rebinning has many schools. It’s the art for some people. We just want to have the job done. I opt for simplicity and understanding of the data we deal with. Imagine we have two adjacent 5 min bins with a tick history of trading:

We want to derive the closest possible (or most fair) price estimation every 5 min, denoted in the above painting by a red marker. The old-school approach is to take the average over a number (larger than 5) of tick data points from the left and from the right. That creates the under- or overestimation of the mid-price.

If we trade live, every 5 min we receive an information on the last tick point before the minute hits 5 and we wait for the next tick point after 5 (blue markers). Taking the average of their prices (mid-price) makes most of sense. The precision we look at here is sometimes $10^{-5}$. It is not much of significance if our position is small, but if it is not, the mid-price may start playing a crucial role.

The cons of the old-school approach: a possible high volatility among all tick-data within last 5 minutes that we neglect.

The following Python code (pp2.py) performs 5-min rebinning for our pre-processed AUDUSD-2014-09 file:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 # pp2.py import csv import numpy as np   def convert(data): tempDATA = [] for i in data: tempDATA.append([float(j) for j in i.split()]) return np.array(tempDATA).T   fname="AUDUSD.pp"   with open(fname) as f: data = f.read().splitlines()   #print(data)   i=1 for d in data: list=[s for s in d.split(' ')] #print(list) # remover empty element in the list dd=[x for x in list if x] #print(dd) tmp=convert(dd) #print(tmp) if(i==1): a=tmp i+=1 else: a = np.vstack([a, tmp]) i+=1   N=i-1 #print("N = %d" % N)   # print the first line tmp=np.array([a[1][0],a[1][1],a[1][2],a[1][3],a[1][4],0.0,(a[1][6]+a[1][7])/2]) print("%.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f" % (tmp[0],tmp[1],tmp[2],tmp[3],tmp[4],tmp[5],tmp[6])) m=tmp   # check the boundary conditions (5 min bins) for i in xrange(2,N-1): if( (a[i-1][4]%5!=0.0) and (a[i][4]%5==0.0)):   # BLUE MARKER No. 1 # (print for i-1) #print(" %.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f %10.6f" % # (a[i-1][0],a[i-1][1],a[i-1][2],a[i-1][3],a[i-1][4],a[i-1][5],a[i-1][6],a[i-1][7])) b1=a[i-1][6] b2=a[i][6] a1=a[i-1][7] a2=a[i][7] # mid-price, and new date for 5 min bin bm=(b1+b2)/2 am=(a1+a2)/2 Ym=a[i][0] Mm=a[i][1] Dm=a[i][2] Hm=a[i][3] MMm=a[i][4] Sm=0.0 # set seconds to zero   # RED MARKER print("%.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f" % (Ym,Mm,Dm,Hm,MMm,Sm,(bm+am)/2)) tmp=np.array([Ym,Mm,Dm,Hm,MMm,Sm,(bm+am)/2]) m=np.vstack([m, tmp])   # BLUE MARKER No. 2 # (print for i) #print(" %.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f %10.6f" % # (a[i][0],a[i][1],a[i][2],a[i][3],a[i][4],a[i][5],a[i][6],a[i][7]))

what you run in pp.scr file as:

31 32 33 # pp.scr (continued)   python pp2.py > AUDUSD.dat

in order to get 5-min rebinned FX time-series as follows:

$head AUDUSD.dat 2014 9 1 0 0 0.000 0.932935 2014 9 1 0 5 0.000 0.933023 2014 9 1 0 10 0.000 0.932917 2014 9 1 0 15 0.000 0.932928 2014 9 1 0 20 0.000 0.932937 2014 9 1 0 25 0.000 0.933037 2014 9 1 0 30 0.000 0.933075 2014 9 1 0 35 0.000 0.933070 2014 9 1 0 40 0.000 0.933092 2014 9 1 0 45 0.000 0.933063 That concludes our efforts. Happy rebinning! ## Gap-on-Open Profitable Trading Strategy After a longer while, QuantAtRisk is back to business. As an algo trader I have been always tempted to test a gap-on-open trading strategy. There were various reasons standing behind it but the most popular one was always omni-discussed: good/bad news on the stock. And what? The stock price skyrocketed/dropped down on the following days. When we approach such price patterns, we talk about triggers or triggered events. The core of the algorithm’s activity is the trigger identification and taking proper actions: to go long or short. That’s it. In both cases we want to make money. In this post we will design the initial conditions for our gap-on-open trading strategy acting as the triggers and we will backtest a realistic scenario of betting our money on those stocks that opened higher on the next trading day. Our goal is to find the most optimal holding period for such trades closed with a profit. Portfolio Our strategy can be backtested using any$N$-asset portfolio. Here, for simplicity, let us use a random subset of 10 stocks (portfolio.lst) being a part of a current Dow Jones Index: AXP CSCO DIS IBM JNJ KO NKE PG UTX XOM In Matlab, we fetch the stock prices from Google Finance data provider accessible via Quandl.com’s Matlab API (see this post for its setup in Matlab). We commence writing our main backtesting code as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 % Gap on Open Trading Strategy % Fetching stock prices via Quandl and Strategy Backtesting % % (c) 2014 by Pawel Lachowicz, QuantAtRisk.com clear all; close all; clc; fname=['portfolio.lst']; % Model's parameter #1 (years) parm1=1; ndays=parm1*365; lday=datenum('2014-08-05'); % fetching stock data [Top,Thp,Tlp,Tcp,N,ntdays]=FetchQuandl(fname,ndays,lday); where we use a pre-designed function of FetchQuandl to import 4 separate price-series of each stock’s open (Top), high (Thp), low (Tlp), and close (Tcp) daily prices: function [Top,Thp,Tlp,Tcp,N,ntdays]=FetchQuandl(fname,ndays,lday) % Read the list of Dow Jones components fileID = fopen(fname); tmp = textscan(fileID,'%s'); fclose(fileID); components=tmp{1}; % a list as a cell array % Read in the list of tickers and internal codes from Quandl.com [~,text,~] = xlsread('QuandlStockCodeListUS.xlsx'); quandlc=text(:,1); % again, as a list in a cell array quandlcode=text(:,3); % corresponding Quandl's Price Code % fetch stock data for last ‘ndays’ date2=datestr(lday,'yyyy-mm-dd'); % from date1=datestr(lday-ndays,'yyyy-mm-dd'); % to Rop={}; Tcp={}; % scan all tickers and fetch the data from Quandl.com for i=1:length(components) for j=1:length(quandlc) if(strcmp(components{i},quandlc{j})) fprintf('%4.0f %s\n',i,quandlc{j}); fts=0; [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ... 'authcode','PutHereYourQuandlCode',... 'start_date',date1,'end_date',date2); cp=fts2mat(fts.Close,1); Tcp{i}=cp; % close price-series op=fts2mat(fts.Open,1); Top{i}=op; % open price-series hp=fts2mat(fts.High,1); Thp{i}=hp; % high price lp=fts2mat(fts.Low,1); Tlp{i}=lp; % low price %Rcp{i}=cp(2:end,2)./cp(1:end-1,2)-1; % return-series cp end end end N=length(components); ntdays=length(Tcp{1}); end Please note that in line #12 we specified number of years, i.e. how far our backtest should be extended backward in time (or number of calendar days; see line #13) from the day specified in line #14 (last day). Trading Model First, let us design the trading strategy. We scan concurrently four price-series for each stock separately. We define the strategy’s trigger as follows: i.e. if a stock open price on day$t$was higher than the close price on the day$t-1$and the lowest prices on day$t$was higher than the highest price on day$t-1$. Having that, we make a BUY LONG decision! We buy that stock on the next day at its market price (close price). This approach should remove the slippage bias effectively (see more on slippage in stock trading here). Now, we run the backtest on each stock and each open trade. We select the second parameter (parm2) to be a number of days, i.e. how long we hold the stock. In the following piece of code, let us allow to sell the stock after/between 1 to 21 calendar days ($\pm$weekend or public holidays time period): 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 % pre-defined matrix for backtest final results results=[]; for parm2=0:20 cR=[]; for i=1:N % just for a purpose of plotting of price-series if(i==1) % open (blue color) plot(Top{i}(:,1),Top{i}(:,2),'') hold on % close (red color) plot(Tcp{i}(:,1),Tcp{i}(:,2),'r') hold on % high (green color) plot(Thp{i}(:,1),Thp{i}(:,2),'g') % xlabel('Days'); ylabel('AXP Stock Prices [US$]'); end   Tbuy=[]; for t=2:ntdays % define indicators ind1=Tcp{i}(t-1,2); % cp on (t-1)day ind2=Thp{i}(t-1,2); % hp on (t-1)day ind3=Top{i}(t,2); % op on (t)day ind4=Tlp{i}(t,2); % lp on (t)day % detect trigger if(ind1<ind3)&&(ind2<ind4) % plotting only for AXP if(i==1) hold on; plot(Top{i}(t,1),Top{i}(t,2),'o'); end % date of a trigger tday=Top{i}(t,1); nextbusdate=busdate(tday,1); % find next trading date Tbuy=[Tbuy; nextbusdate]; end end Tsell=busdate(Tbuy+parm2,1);

Here, in lines #57 and #60 we constructed time array storing physical information on those days. Now, we will use them to check the price on trade’s open and close and derive profit and loss for each stock:

62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 R=[]; for k=1:length(Tbuy) j=find(Tbuy(k)==Tcp{i}(:,1)); pbuy=Tcp{i}(j,2); j=find(Tsell(k)==Tcp{i}(:,1)); psell=Tcp{i}(j,2); ret=(psell/pbuy-1); % return per trade R=[R; ret]; end   compR=prod(R+1)-1; % compound return per stock cR=[cR; compR];   end   results=[results cR];   end

In the inner loop (lines #24 to #75, i.e. tracking a number of stocks in portfolio; index $i$, here 1 to 10) we capture all trades per stock (lines #63-70) and calculate a multi-period compound return (line #72) as if we were trading that stock solely using our model.

For instance, for stock $i=1$ (AXP) from our portfolio, our code displays 1-year price-series:
where days meeting our trigger criteria have been denoted by open-circle markers. If you now re-run the backtest making a gentle substitution in line #24 now to be:

24 for i=1:1

we can find that by running through some extra lines of code as defined:

81 82 83 84 figure(2) stem((0:20),100*results) xlabel('Holding Period [days]'); ylabel('AXP: Compound Return [%]');

we obtain an appealing result:

The chart reveals that for AXP, over past 251 days (since Aug/4 2014 backwards), we had 16 triggers therefore 16 trades and, surprisingly, regardless of holding period, the compound return from all closed trades was highly positive (profitable).

This is not the case if we consider, for example, $i=4$, IBM stock:
This result points that for different holding periods (and different stocks of course) certain extra trading indicators should be applied to limit the losses (e.g. profit targets).

If we traded a whole portfolio using our gap-on-open model, we would end up with very encouraging result:
where for each holding period we displayed the averaged over 10 stocks compound return. Taking into account the global up-trend in the US stock markets between August of 2013 and 2014, this strategy is worth its consideration with any further modifications (e.g. considering short or both long and short triggers, FX time-series, etc.).

Someone wise once said: Sometimes you win. Sometimes you learn. In algo trading we all learn to win.

In next post…
Marginal Value-at-Risk for Portfolio Managers

WANT TO LEARN MORE ON PORTFOLIOS in MATLAB!?

## Ideal Stock Trading Model for the Purpose of Backtesting Only

There is only one goal in algorithmic trading: to win the game. To emerge victorious. To feel the sunlight again after the months of walking through the valley of shadows being stuck in the depths of algo drawdowns. An endless quest for the most brilliant minds, to find the way to win over and over, and over again. To discover a perfect solution as your new private weapon in this battleground. Is it possible? If the answer were so simple, we wouldn’t be so engaged in seeking for a golden mean.

However, algo trading is a journey, and sometimes in the process of programming and testing of our trading systems, we need to have an ideal trading model ready-to-use straight away! What I mean by the ideal model is a sort of template, a benchmark that will allow us to list a number of successful trades, their starting and closing times, open and close price of the trades being executed, and the realized return coming down to our pocket from each trade. Such a trading model template also allows us to look at the trading data from a different perspective and re-consider and/or apply an additional risk management framework. In fact, the benefits are limitless for the backtesting purposes.

In this post we will explore one of the easiest ways in programming a perfect model by re-directing the time-axis backwards. Using an example of the data of a Google, Inc. (GOOG) stock listed at NASDAQ, we will analyse the stock trading history and find all possible trades returning at least 3% over past decade.

The results of this strategy I will use within the upcoming (this week!) series of new posts on Portfolio Optimization in Matlab for Algo Traders.

Model and Data

Let’s imagine we are interested in finding a large number of trades with the expected return from each trade to be at least 3%. We consider GOOG stock (daily close prices) with data spanning 365$\times$10 days back in time since the present day (last 10 years). We will make use of Google Finance data powered by Quandl as described in one of my previous posts, namely, Create a Portfolio of Stocks based on Google Finance Data fed by Quandl. Shall we begin?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 % Ideal Stock Trading Model for the Purpose of Backtesting Only % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz     clear all; close all; clc;   stocklist={'GOOG'};   % Read in the list of tickers and internal code from Quandl.com [ndata, text, alldata] = xlsread('QuandlStockCodeListUS.xlsx'); quandlc=text(:,1); % again, as a list in a cell array quandlcode=text(:,3) % corresponding Quandl's Price Code   % fetch stock data for last 10 years date2=datestr(today,'yyyy-mm-dd') % from date1=datestr(today-365*10,'yyyy-mm-dd') % to   stockd={}; for i=1:length(stocklist) for j=1:length(quandlc) if(strcmp(stocklist{i},quandlc{j})) fprintf('%4.0f %s\n',i,quandlc{j}); % fetch the data of the stock from Quandl % using recommanded Quandl's command and % saving them directly into FTS object (fts) fts=0; [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ... 'authcode','ENTER_YOUR_CODE',... 'start_date',date1,'end_date',date2); stockd{i}=fts; % entire FTS object in an array's cell end end end

The extracted data of GOOG from Google Finance via Quandl we can visualize immediately as follows:

36 37 38 39 40 41 42 % plot the close prices of GOOG cp=fts2mat(stockd{1}.Close,1); plot(cp(:,1),cp(:,2),'color',[0.6 0.6 0.6]) xlim([min(cp(:,1)) max(cp(:,1))]); ylim([min(cp(:,2)) max(cp(:,2))]); xlabel('Nov 2003 - Nov 2013 (days)'); ylabel('GOOG Close Price ($)'); We open the trade on the first day (long position) and as time goes by, on the following day we check whether the stock value increased by 3% or more. If not, we increase the current time position by one day and check again. If the condition is met, we close the opened trade on the following trading (business) day at the stock’s market price. We allow to open a new trade at the stock’s market price one next business day after the day when the previous trade has been closed (exercised). Additionally, we check if the ‘running’ return from the open trade exceeds -5%. If so, we substitute the open time of a new trade with the time when the latter condition has been fulfilled (the same for the open price of that trade). The latter strategy allow us to restart the backtesting process therefore the search for a new profitable trade. 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 t0=cp(1,1); % starting day for backtesting tN=cp(end,1); % last day trades=[]; % open a log-book for all executed traded status=0; % status meaning: 0-no open trade, 1-open trade t=t0; % we loop over time (t) [days] while(t<tN) [r,~,~]=find(t==cp(:,1)); % check the row in cp vector if(~isempty(r)) if(~status) topen=t; % time when we open the trade popen=cp(r,2); % assuming market price of the stock status=1; else ptmp=cp(r,2); % running close price rtmp=ptmp/popen-1; % running return of the open trade if(rtmp>0.03) % check 3% profit condition % if met, then tclose=busdate(t,1); % close time of the trade % assumed on the next business day t=busdate(tclose,1); % next day in the loop if(tclose<=tN) [r,~,~]=find(tclose==cp(:,1)); pclose=cp(r,2); % close price ret=pclose/popen-1; % realized profit/loss % save the trade details into log-book trades=[trades; topen tclose popen pclose ret*100]; status=0; % change status of trading to not-open % mark the opening of the trade as blue dot marker hold on; plot(topen,popen,'b.','markerfacecolor','b'); % mark the end time of the trade hold on; plot(tclose,pclose,'r.','markerfacecolor','r'); end elseif(rtmp<=-0.05) % check an additional condition topen=t; % overwrite the time popen=cp(r,2); % and the price status=1; % sustain the status of the trade as 'open' else t=t+1; end end else t=t+1; end end In this piece of code, in the variable matrix of trades (a log-book of all exercised trades) we store the history of all successful trades meeting our earlier assumed criteria. The only uncertainty that we allow to slip into our perfect solution is the one related to an instance when the the close price on the next business day occurs to be lower, generating the realized profit from the trade less than 3%. By plotting all good trades with the ending day of$tN$set as for Nov 18, 2013, we get a messy picture: which translates into more intuitive one once we examine the distribution of profits from all trades: figure(3); hist(trades(:,5),50); xlabel('Profit/loss (%)'); ylabel('Number of trades'); In this point the most valuable information is contained in the log-book which content we can analyze trade by trade: >> format shortg >> trades trades = 7.3218e+05 7.3218e+05 100.34 109.4 9.0293 7.3218e+05 7.3220e+05 104.87 112 6.7989 7.3221e+05 7.3221e+05 113.97 119.36 4.7293 7.3221e+05 7.3222e+05 117.84 131.08 11.236 7.3222e+05 7.3222e+05 129.6 138.37 6.767 7.3223e+05 7.3224e+05 137.08 144.11 5.1284 7.3224e+05 7.3224e+05 140.49 172.43 22.735 7.3224e+05 7.3225e+05 187.4 190.64 1.7289 ... 7.3533e+05 7.3535e+05 783.05 813.45 3.8823 7.3535e+05 7.3536e+05 809.1 861.55 6.4825 7.3536e+05 7.3537e+05 857.23 915.89 6.843 7.3546e+05 7.3549e+05 856.91 888.67 3.7063 7.3549e+05 7.3553e+05 896.19 1003.3 11.952 where the columns correspond to the open and close time of the trade (a continuous Matlab’s time measure for the financial time-series; see datestr command for getting yyyy-mm-dd date format), open and close price of GOOG stock, and realized profit/loss of the trade, respectively. Questions? Discuss on Forum. Just dive directly into Backtesting section on QaR Forum and keep up, never give up. ## Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting Within the evolution of Mathworks’ MATLAB programming environment, finally, in the most recent version labelled 2013a we received a longly awaited line-command facilitation for pulling stock data directly from the Yahoo! servers. What does that mean for quants and algo traders? Honestly, a lot. Now, simply writing a few commands we can have nearly all what we want. However, please keep in mind that Yahoo! data are free therefore not always in one hundred percent their precision remains at the level of the same quality as, e.g. downloaded from Bloomberg resources. Anyway, just for pure backtesting of your models, this step introduces a big leap in dealing with daily stock data. As usual, we have a possibility of getting open, high, low, close, adjusted close prices of stocks supplemented with traded volume and the dates plus values of dividends. In this post I present a short example how one can retrieve the data of SPY (tracking the performance of S&P500 index) using Yahoo! data in a new Matlab 2013a and I show a simple code how one can test the time period of buying-holding-and-selling SPY (or any other stock paying dividends) to make a profit every time. The beauty of Yahoo! new feature in Matlab 2013a has been fully described in the official article of Request data from Yahoo! data servers where you can find all details required to build the code into your Matlab programs. Model for Dividends It is a well known opinion (based on many years of market observations) that one may expect the drop of stock price within a short timeframe (e.g. a few days) after the day when the stock’s dividends have been announced. And probably every quant, sooner or later, is tempted to verify that hypothesis. It’s your homework. However, today, let’s look at a bit differently defined problem based on the omni-working reversed rule: what goes down, must go up. Let’s consider an exchange traded fund of SPDR S&P 500 ETF Trust labelled in NYSE as SPY. First, let’s pull out the Yahoo! data of adjusted Close prices of SPY from Jan 1, 2009 up to Aug 27, 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 % Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz close all; clear all; clc; date_from=datenum('Jan 1 2009'); date_to=datenum('Aug 27 2013'); stock='SPY'; adjClose = fetch(yahoo,stock,'adj close',date_from,date_to); div = fetch(yahoo,stock,date_from,date_to,'v') returns=(adjClose(2:end,2)./adjClose(1:end-1,2)-1); % plot adjusted Close price of and mark days when dividends % have been announced plot(adjClose(:,1),adjClose(:,2),'color',[0.6 0.6 0.6]) hold on; plot(div(:,1),min(adjClose(:,2))+10,'ob'); ylabel('SPY (US$)'); xlabel('Jan 1 2009 to Aug 27 2013');

and visualize them:

Having the data ready for backtesting, let’s look for the most profitable period of time of buying-holding-and-selling SPY assuming that we buy SPY one day after the dividends have been announced (at the market price), and we hold for $dt$ days (here, tested to be between 1 and 40 trading days).

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 % find the most profitable period of holding SPY (long position) neg=[]; for dt=1:40   buy=[]; sell=[]; for i=1:size(div,1) % find the dates when the dividends have been announced [r,c,v]=find(adjClose(:,1)==div(i,1)); % mark the corresponding SPY price with blue circle marker hold on; plot(adjClose(r,1),adjClose(r,2),'ob'); % assume you buy long SPY next day at the market price (close price) buy=[buy; adjClose(r-1,1) adjClose(r-1,2)]; % assume you sell SPY in 'dt' days after you bought SPY at the market % price (close price) sell=[sell; adjClose(r-1-dt,1) adjClose(r-1-dt,2)]; end   % calculate profit-and-loss of each trade (excluding transaction costs) PnL=sell(:,2)./buy(:,2)-1; % summarize the results neg=[neg; dt sum(PnL<0) sum(PnL<0)/length(PnL)];   end

If we now sort the results according to the percentage of negative returns (column 3 of neg matrix), we will be able to get:

>> sortrows(neg,3)   ans = 18.0000 2.0000 0.1111 17.0000 3.0000 0.1667 19.0000 3.0000 0.1667 24.0000 3.0000 0.1667 9.0000 4.0000 0.2222 14.0000 4.0000 0.2222 20.0000 4.0000 0.2222 21.0000 4.0000 0.2222 23.0000 4.0000 0.2222 25.0000 4.0000 0.2222 28.0000 4.0000 0.2222 29.0000 4.0000 0.2222 13.0000 5.0000 0.2778 15.0000 5.0000 0.2778 16.0000 5.0000 0.2778 22.0000 5.0000 0.2778 27.0000 5.0000 0.2778 30.0000 5.0000 0.2778 31.0000 5.0000 0.2778 33.0000 5.0000 0.2778 34.0000 5.0000 0.2778 35.0000 5.0000 0.2778 36.0000 5.0000 0.2778 6.0000 6.0000 0.3333 8.0000 6.0000 0.3333 10.0000 6.0000 0.3333 11.0000 6.0000 0.3333 12.0000 6.0000 0.3333 26.0000 6.0000 0.3333 32.0000 6.0000 0.3333 37.0000 6.0000 0.3333 38.0000 6.0000 0.3333 39.0000 6.0000 0.3333 40.0000 6.0000 0.3333 5.0000 7.0000 0.3889 7.0000 7.0000 0.3889 1.0000 9.0000 0.5000 2.0000 9.0000 0.5000 3.0000 9.0000 0.5000 4.0000 9.0000 0.5000

what simply indicates at the most optimal period of holding the long position in SPY equal 18 days. We can mark all trades (18 day holding period) in the chart:

where the trade open and close prices (according to our model described above) have been marked in the plot by black and red circle markers, respectively. Only 2 out of 18 trades (PnL matrix) occurred to be negative with the loss of 2.63% and 4.26%. The complete distribution of profit and losses from all trades can be obtained in the following way:

47 48 49 50 figure(2); hist(PnL*100,length(PnL)) ylabel('Number of trades') xlabel('Return (%)')

returning

Let’s make some money!

The above Matlab code delivers a simple application of the newest build-in connectivity with Yahoo! server and the ability to download the stock data of our interest. We have tested the optimal holding period for SPY since the beginning of 2009 till now (global uptrend). The same code can be easily used and/or modified for verification of any period and any stock for which the dividends had been released in the past. Fairly simple approach, though not too frequent in trading, provides us with some extra idea how we can beat the market assuming that the future is going to be/remain more or less the same as the past. So, let’s make some money!

## Simulation of Portfolio Value using Geometric Brownian Motion Model

Having in mind the upcoming series of articles on building a backtesting engine for algo traded portfolios, today I decided to drop a short post on a simulation of the portfolio realised profit and loss (P&L). In the future I will use some results obtained below for a discussion of key statistics used in the evaluation of P&L at any time when it is required by the portfolio manager.

Assuming that we trade a portfolio of any assets, its P&L can be simulated in a number of ways. One of the quickest method is the application of geometric brownian motion (GBM) model with a drift in time of $\mu_t$ and the process standard deviation of $\sigma_t$ over its total time interval. The model takes its form as follows:
$$dS_t = \mu_t S_t dt + \sigma_t S_t dz$$ where $dz\sim N(0,dt)$ and the process has variance equal to $dt$ (the process is brownian). Let $t$ is the present time and the portfolio has an initial value of $S_t$ dollars. The target time is $T$ therefore portfolio time horizon of evaluation is $\tau=T-t$ at $N$ time steps. Since the GBM model assumes no correlations between the values of portfolio on two consecutive days (in general, over time), by integrating $dS/S$ over finite interval we get a discrete change of portfolio value:
$$\Delta S_t = S_{t-1} (\mu_t\Delta t + \sigma_t\epsilon \sqrt{\Delta t}) \ .$$ For simplicity, one can assume that both parameters of the model, $\mu_t$ and $\sigma_t$ are constant over time, and the random variable $\epsilon\sim N(0,1)$. In order to simulate the path of portfolio value, we go through $N$ iterations following the formula:
$$S_{t+1} = S_t + S_t(\mu_t\Delta t + \sigma_t \epsilon_t \sqrt{\Delta t})$$ where $\Delta t$ denotes a local volatility defined as $\sigma_t/\sqrt{N}$ and $t=1,…,N$.

Example

Let’s assume that initial portfolio value is $S_1=\$10,000$and it is being traded over 252 days. We allow the underlying process to have a drift of$\mu_t=0.05$and the overall volatility of$\sigma_t=5%$constant over time. Since the simulation in every of 252 steps depends on$\epsilon$drawn from the normal distribution$N(0,1)$, we can obtain any number of possible realisations of the simulated portfolio value path. Coding quickly the above model in Matlab, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 mu=0.05; % drift sigma=0.05; % std dev over total inverval S1=10000; % initial capital ($) N=252; % number of time steps (trading days) K=1; % nubber of simulations   dt=sigma/sqrt(N); % local volatility   St=[S1]; for k=1:K St=[S1]; for i=2:N eps=randn; S=St(i-1)+St(i-1)*(mu*dt+sigma*eps*sqrt(dt)); St=[St; S]; end hold on; plot(St); end xlim([1 N]); xlabel('Trading Days') ylabel('Simulated Portfolio Value ($)'); lead us to one of the possible process realisations, quite not too bad, with the annual rate of return of about 13%. The visual inspection of the path suggests no major drawbacks and low volatility, therefore pretty good trading decisions made by portfolio manager or trading algorithm. Of course, the simulation does not tell us anything about the model, the strategy involved, etc. but the result we obtained is sufficient for further discussion on portfolio key metrics and VaR evolution. ## Slippage in Model Backtesting A precious lesson I learned during my venture over programming an independent backtesting engine for new trading model was slippage. Simply speaking, slippage is a fraction of stock price which you need to assume as a deviation from the price you are willing to pay. In model backtesting the slippage is extremely important. Why? Let’s imagine your model generates a signal to buy or sell a stock on a day$t_i$, i.e. after when the market has been closed and your stock trading history has been updated with a stock close price. Since you can’t buy/sell this stock on day$t_i$, your algo-trading system in connection to your model rules places a new order to be executed on day$t_{i+1}$. Regardless of the position the stock holds, you don’t know the price on the following day at the opening of the market. Well, in real-time trading – yes. However, in the backtesting of your model this information is available, e.g. your have historical stock prices of IBM in Aug 2008, so you know the future. Now, you may wish to program your backtesting engine to buy/sell this stock for you on$t_{i+1}$day at the open, mid-day, intra-day, or even close price. The choice is yours. There are different strategies. Close price is good option for consideration as long as you have also an track of intra-day trading on$t_{i+1}$, therefore you have time to analyze the intra-day variability, take extra correction for extreme volatility or black-swans, and proceed with your order with extra caution. But if you program a simple approach in your order execution (e.g. buy at open price) you assume some risk of the price not to be in your favour. Quite conservative approach to compensate for systematic unexpected slippages in the stock price when your order has been sent to the broker is to assume in simulations (backtesting) a fixed slippage working against your profits every time. Namely, you don’t buy/sell your stock at the price as given on$t_{i+1}$day in your historical price table. You assume the slippage of$\Delta S$. If the price of the stock is$P$your slippage affects the price: $$P’ = P \pm (P\times \Delta S)$$ where$P’$is the executed price for your simulated order. The sign$\pm$has double meaning. To allow you to understand it, let me draw two basic rules of the slippage in backtesting: If your trading decision is to go long you always buy at the price higher by$P\times \Delta S$than P and you sell the stock at the price lower than$P$, again by$P\times \Delta S\$. Reversely, if you open a short position, you buy lower and sell higher when closing the same position.

The amount of slippage you should assume varies depending on the different conditions. If you are involved in lots of algorithmic trading operations, you probably are able to estimate your slippage. In general, the simulated slippage shouldn’t be more than 2%.

If you forget to include the slippage in your backtesting black-box, it may occur that your model is extremely profitable and you risk a lot in practice. On the other hand, adding slippage to your test may make your day less bright as it has started. But don’t worry. Keep smiling as a new day is a new opportunity, and life is not about avoiding the risks but managing them right.