Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Rebinning Tick-Data for FX Algo Traders

If you work or intend to work with FX data in order to build and backtest your own FX models, the Historical Tick-Data of Pepperstone.com is probably the best place to kick off your algorithmic experience. As for now, they offer tick-data sets of 15 most frequently traded currency pairs since May 2009. Some of the unzip’ed files (one month data) reach over 400 MB in size, i.e. storing 8.5+ millions of lines with a tick resolution for both bid and ask “prices”. A good thing is you can download them all free of charge and their quality is regarded as very high. A bad thing is there is 3 month delay in data accessibility.

Dealing with a rebinning process of tick-data up, that’s a different story and the subject of this post. We will see how efficiently you can turn Pepperstone’s Tick-Data set(s) into 5-min time-series as an example. We will make use of scripting in bash (Linux/OS X) supplemented with data processing in Python.

Data Structure

You can download Pepperstone’s historical tick-data from here, month by month, pair by pair. Their inner structure follows the same pattern, namely:

$ head AUDUSD-2014-09.csv 
AUD/USD,20140901 00:00:01.323,0.93289,0.93297
AUD/USD,20140901 00:00:02.138,0.9329,0.93297
AUD/USD,20140901 00:00:02.156,0.9329,0.93298
AUD/USD,20140901 00:00:02.264,0.9329,0.93297
AUD/USD,20140901 00:00:02.265,0.9329,0.93293
AUD/USD,20140901 00:00:02.265,0.93289,0.93293
AUD/USD,20140901 00:00:02.268,0.93289,0.93295
AUD/USD,20140901 00:00:02.277,0.93289,0.93296
AUD/USD,20140901 00:00:02.278,0.9329,0.93296
AUD/USD,20140901 00:00:02.297,0.93288,0.93296

The columns, from left to right, represent respectively: a pair name, the date and tick-time, the bid price, and the ask price.

Pre-Processing

Here, for each .csv file, we aim to split the date into year, month, and day separately, and remove commas and colons to get raw data ready to be read in as a matrix (array) using any other programming language (e.g. Matlab or Python). The matrix is mathematically intuitive data structure therefore making direct reference to any specific column of it makes any backtesting engine running with its full thrust.

Let’s play with AUDUSD-2014-09.csv data file. Working in the same directory where the file is located we begin with writing a bash script (pp.scr) that contains:

1
2
3
4
5
6
7
8
9
10
11
# pp.scr
# Rebinning Pepperstone.com Tick-Data for FX Algo Traders 
# (c) 2014 QuantAtRisk, by Pawel Lachowicz
 
clear
echo "..making a sorted list of .csv files"
for i in $1-*.csv; do echo ${i##$1-} $i ${i##.csv};
done | sort -n | awk '{print $2}' > $1.lst
 
python pp.py
head AUDUSD.pp

that you run in Terminal:

$ chmod +x pp.scr
$ ./pp.scr AUDUSD

where the first command makes sure the script becomes executable (you need to perform this task only once). Lines #7-8 of our script, in fact, look for all .csv data files in the local directory starting with AUDUSD- prefix and create their list in AUDUSD.lst file. Since we work with AUDUSD-2014-09.csv file only, the AUDUSD.lst file will contain:

$ cat AUDUSD.lst 
AUDUSD-2014-09.csv

as expected. Next, we utilise the power and flexibility of Python in the following way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# pp.py
import csv
 
fnlst="AUDUSD.lst"
fnout="AUDUSD.pp"
 
for lstline in open(fnlst,'r').readlines():
    fncur=lstline[:-1]
    #print(fncur)
 
    with open(fnout,'w') as f:
        writer=csv.writer(f,delimiter=" ")
 
        i=1 # counts a number of lines with tick-data
        for line in open(fncur,'r').readlines():
            if(i<=5200): # replace with (i>0) to process an entire file
                #print(line)
                year=line[8:12]
                month=line[12:14]
                day=line[14:16]
                hh=line[17:19]
                mm=line[20:22]
                ss=line[23:29]
                bidask=line[30:]
                writer.writerow([year,month,day,hh,mm,ss,bidask])
                i+=1

It is a pretty efficient way to open really a big file and process its information line by line. Just for further purpose of display, in the code we told computer to process only first 5,200 of lines. The output of lines #10-11 of pp.scr is the following:

2014 09 01 00 00 01.323 "0.93289,0.93297
"
2014 09 01 00 00 02.138 "0.9329,0.93297
"
2014 09 01 00 00 02.156 "0.9329,0.93298
"
2014 09 01 00 00 02.264 "0.9329,0.93297
"
2014 09 01 00 00 02.265 "0.9329,0.93293
"

since we allowed Python to save bid and ask information as one string (due to a variable number of decimal digits). In order to clean this mess we continue:

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# pp.scr (continued)
echo "..removing token: comma"
sed 's/,/ /g' AUDUSD.pp > $1.tmp
rm AUDUSD.pp
 
echo "..removing token: double quotes"
sed 's/"/ /g' $1.tmp > $1.tmp2
rm $1.tmp
 
echo "..removing empty lines"
sed -i '/^[[:space:]]*$/d' $1.tmp2
mv $1.tmp2 AUDUSD.pp
 
echo "head..."
head AUDUSD.pp
echo "tail..."
tail AUDUSD.pp

what brings us to pre-processed data:

..removing token: comma
..removing token: double quotes
..removing empty lines
head...
2014 09 01 00 00 01.323  0.93289 0.93297
2014 09 01 00 00 02.138  0.9329 0.93297
2014 09 01 00 00 02.156  0.9329 0.93298
2014 09 01 00 00 02.264  0.9329 0.93297
2014 09 01 00 00 02.265  0.9329 0.93293
2014 09 01 00 00 02.265  0.93289 0.93293
2014 09 01 00 00 02.268  0.93289 0.93295
2014 09 01 00 00 02.277  0.93289 0.93296
2014 09 01 00 00 02.278  0.9329 0.93296
2014 09 01 00 00 02.297  0.93288 0.93296
tail...
2014 09 02 00 54 39.324  0.93317 0.93321
2014 09 02 00 54 39.533  0.93319 0.93321
2014 09 02 00 54 39.543  0.93318 0.93321
2014 09 02 00 54 39.559  0.93321 0.93321
2014 09 02 00 54 39.784  0.9332 0.93321
2014 09 02 00 54 39.798  0.93319 0.93321
2014 09 02 00 54 39.885  0.93319 0.93325
2014 09 02 00 54 39.886  0.93319 0.93321
2014 09 02 00 54 40.802  0.9332 0.93321
2014 09 02 00 54 48.829  0.93319 0.93321

Personally, I love that part as you can learn how to do simple but necessary text file operations by typing single lines of Unix/Linux commands. Good luck for those who try to repeat the same in Microsoft Windows not spending more than 30 sec for doing it.

Rebinning: 5-min Data

The rebinning has many schools. It’s the art for some people. We just want to have the job done. I opt for simplicity and understanding of the data we deal with. Imagine we have two adjacent 5 min bins with a tick history of trading:

sam
We want to derive the closest possible (or most fair) price estimation every 5 min, denoted in the above painting by a red marker. The old-school approach is to take the average over a number (larger than 5) of tick data points from the left and from the right. That creates the under- or overestimation of the mid-price.

If we trade live, every 5 min we receive an information on the last tick point before the minute hits 5 and we wait for the next tick point after 5 (blue markers). Taking the average of their prices (mid-price) makes most of sense. The precision we look at here is sometimes $10^{-5}$. It is not much of significance if our position is small, but if it is not, the mid-price may start playing a crucial role.

The cons of the old-school approach: a possible high volatility among all tick-data within last 5 minutes that we neglect.

The following Python code (pp2.py) performs 5-min rebinning for our pre-processed AUDUSD-2014-09 file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# pp2.py
import csv
import numpy as np
 
def convert(data):
     tempDATA = []
     for i in data:
         tempDATA.append([float(j) for j in i.split()])
     return np.array(tempDATA).T
 
fname="AUDUSD.pp"
 
with open(fname) as f:
    data = f.read().splitlines()
 
#print(data)
 
i=1
for d in data:
    list=[s for s in d.split(' ')]
    #print(list)
    # remover empty element in the list
    dd=[x for x in list if x]
    #print(dd)
    tmp=convert(dd)
    #print(tmp)
    if(i==1):
        a=tmp
        i+=1
    else:
        a = np.vstack([a, tmp])
        i+=1
 
N=i-1
#print("N = %d" % N)
 
# print the first line
tmp=np.array([a[1][0],a[1][1],a[1][2],a[1][3],a[1][4],0.0,(a[1][6]+a[1][7])/2])
print("%.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f" %
             (tmp[0],tmp[1],tmp[2],tmp[3],tmp[4],tmp[5],tmp[6]))
m=tmp
 
# check the boundary conditions (5 min bins)
for i in xrange(2,N-1):
    if( (a[i-1][4]%5!=0.0) and (a[i][4]%5==0.0)):
 
        # BLUE MARKER No. 1
        # (print for i-1)
        #print(" %.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f %10.6f" %
        #      (a[i-1][0],a[i-1][1],a[i-1][2],a[i-1][3],a[i-1][4],a[i-1][5],a[i-1][6],a[i-1][7]))
        b1=a[i-1][6]
        b2=a[i][6]
        a1=a[i-1][7]
        a2=a[i][7]
        # mid-price, and new date for 5 min bin
        bm=(b1+b2)/2
        am=(a1+a2)/2
        Ym=a[i][0]
        Mm=a[i][1]
        Dm=a[i][2]
        Hm=a[i][3]
        MMm=a[i][4]
        Sm=0.0        # set seconds to zero
 
        # RED MARKER
        print("%.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f" %
              (Ym,Mm,Dm,Hm,MMm,Sm,(bm+am)/2))
        tmp=np.array([Ym,Mm,Dm,Hm,MMm,Sm,(bm+am)/2])
        m=np.vstack([m, tmp])
 
        # BLUE MARKER No. 2
        # (print for i)
        #print(" %.0f %2.0f %2.0f %2.0f %2.0f %6.3f %10.6f %10.6f" %
        #      (a[i][0],a[i][1],a[i][2],a[i][3],a[i][4],a[i][5],a[i][6],a[i][7]))

what you run in pp.scr file as:

31
32
33
# pp.scr (continued)
 
python pp2.py > AUDUSD.dat

in order to get 5-min rebinned FX time-series as follows:

$ head AUDUSD.dat
2014  9  1  0  0  0.000   0.932935
2014  9  1  0  5  0.000   0.933023
2014  9  1  0 10  0.000   0.932917
2014  9  1  0 15  0.000   0.932928
2014  9  1  0 20  0.000   0.932937
2014  9  1  0 25  0.000   0.933037
2014  9  1  0 30  0.000   0.933075
2014  9  1  0 35  0.000   0.933070
2014  9  1  0 40  0.000   0.933092
2014  9  1  0 45  0.000   0.933063

That concludes our efforts. Happy rebinning!

Gap-on-Open Profitable Trading Strategy


After a longer while, QuantAtRisk is back to business. As an algo trader I have been always tempted to test a gap-on-open trading strategy. There were various reasons standing behind it but the most popular one was always omni-discussed: good/bad news on the stock. And what? The stock price skyrocketed/dropped down on the following days. When we approach such price patterns, we talk about triggers or triggered events. The core of the algorithm’s activity is the trigger identification and taking proper actions: to go long or short. That’s it. In both cases we want to make money.

In this post we will design the initial conditions for our gap-on-open trading strategy acting as the triggers and we will backtest a realistic scenario of betting our money on those stocks that opened higher on the next trading day. Our goal is to find the most optimal holding period for such trades closed with a profit.

Portfolio

Our strategy can be backtested using any $N$-asset portfolio. Here, for simplicity, let us use a random subset of 10 stocks (portfolio.lst) being a part of a current Dow Jones Index:

AXP   CSCO   DIS   IBM   JNJ   KO   NKE   PG   UTX   XOM

In Matlab, we fetch the stock prices from Google Finance data provider accessible via Quandl.com’s Matlab API (see this post for its setup in Matlab). We commence writing our main backtesting code as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
% Gap on Open Trading Strategy
%  Fetching stock prices via Quandl and Strategy Backtesting
%
% (c) 2014 by Pawel Lachowicz, QuantAtRisk.com
 
 
clear all; close all; clc;
 
fname=['portfolio.lst'];
 
% Model's parameter #1 (years)
parm1=1;  
ndays=parm1*365;
lday=datenum('2014-08-05');
% fetching stock data
[Top,Thp,Tlp,Tcp,N,ntdays]=FetchQuandl(fname,ndays,lday);

where we use a pre-designed function of FetchQuandl to import 4 separate price-series of each stock’s open (Top), high (Thp), low (Tlp), and close (Tcp) daily prices:

function [Top,Thp,Tlp,Tcp,N,ntdays]=FetchQuandl(fname,ndays,lday)
    % Read the list of Dow Jones components
    fileID = fopen(fname);
    tmp = textscan(fileID,'%s');
    fclose(fileID);
    components=tmp{1};  % a list as a cell array
 
    % Read in the list of tickers and internal codes from Quandl.com
    [~,text,~] = xlsread('QuandlStockCodeListUS.xlsx');
    quandlc=text(:,1);    % again, as a list in a cell array
    quandlcode=text(:,3); % corresponding Quandl's Price Code
 
    % fetch stock data for last ‘ndays’
    date2=datestr(lday,'yyyy-mm-dd');       % from
    date1=datestr(lday-ndays,'yyyy-mm-dd'); % to
 
    Rop={}; Tcp={};
    % scan all tickers and fetch the data from Quandl.com
    for i=1:length(components)
        for j=1:length(quandlc)
            if(strcmp(components{i},quandlc{j}))
                fprintf('%4.0f %s\n',i,quandlc{j});
                fts=0;
                [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ...
                              'authcode','PutHereYourQuandlCode',...
                              'start_date',date1,'end_date',date2);
                cp=fts2mat(fts.Close,1); Tcp{i}=cp;     % close price-series
                op=fts2mat(fts.Open,1);  Top{i}=op;     % open price-series
                hp=fts2mat(fts.High,1);  Thp{i}=hp;     % high price
                lp=fts2mat(fts.Low,1);   Tlp{i}=lp;     % low price
                %Rcp{i}=cp(2:end,2)./cp(1:end-1,2)-1;   % return-series cp
            end
        end
    end
    N=length(components);
    ntdays=length(Tcp{1});
end

Please note that in line #12 we specified number of years, i.e. how far our backtest should be extended backward in time (or number of calendar days; see line #13) from the day specified in line #14 (last day).

Trading Model

First, let us design the trading strategy. We scan concurrently four price-series for each stock separately. We define the strategy’s trigger as follows:
triggers-2 i.e. if a stock open price on day $t$ was higher than the close price on the day $t-1$ and the lowest prices on day $t$ was higher than the highest price on day $t-1$. Having that, we make a BUY LONG decision! We buy that stock on the next day at its market price (close price). This approach should remove the slippage bias effectively (see more on slippage in stock trading here).

Now, we run the backtest on each stock and each open trade. We select the second parameter (parm2) to be a number of days, i.e. how long we hold the stock. In the following piece of code, let us allow to sell the stock after/between 1 to 21 calendar days ($\pm$ weekend or public holidays time period):

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
% pre-defined matrix for backtest final results
results=[];
 
for parm2=0:20
 
    cR=[];
    for i=1:N
        % just for a purpose of plotting of price-series
        if(i==1)
            % open (blue color)
            plot(Top{i}(:,1),Top{i}(:,2),'')
            hold on
            % close (red color)
            plot(Tcp{i}(:,1),Tcp{i}(:,2),'r')
            hold on
            % high (green color)
            plot(Thp{i}(:,1),Thp{i}(:,2),'g')
            %
            xlabel('Days');
            ylabel('AXP Stock Prices [US$]');
        end
 
        Tbuy=[];
        for t=2:ntdays
            % define indicators 
            ind1=Tcp{i}(t-1,2);  % cp on (t-1)day
            ind2=Thp{i}(t-1,2);  % hp on (t-1)day
            ind3=Top{i}(t,2);    % op on (t)day
            ind4=Tlp{i}(t,2);    % lp on (t)day
            % detect trigger
            if(ind1<ind3)&&(ind2<ind4)
                % plotting only for AXP
                if(i==1)
                    hold on;
                    plot(Top{i}(t,1),Top{i}(t,2),'o');
                end
                % date of a trigger
                tday=Top{i}(t,1);
                nextbusdate=busdate(tday,1); % find next trading date
                Tbuy=[Tbuy; nextbusdate];
            end
        end
        Tsell=busdate(Tbuy+parm2,1);

Here, in lines #57 and #60 we constructed time array storing physical information on those days. Now, we will use them to check the price on trade’s open and close and derive profit and loss for each stock:

62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
        R=[];
        for k=1:length(Tbuy)
            j=find(Tbuy(k)==Tcp{i}(:,1));
            pbuy=Tcp{i}(j,2);
            j=find(Tsell(k)==Tcp{i}(:,1));
            psell=Tcp{i}(j,2);
            ret=(psell/pbuy-1); % return per trade
            R=[R; ret];
        end
 
        compR=prod(R+1)-1;  % compound return per stock
        cR=[cR; compR];
 
    end
 
    results=[results cR];
 
end

In the inner loop (lines #24 to #75, i.e. tracking a number of stocks in portfolio; index $i$, here 1 to 10) we capture all trades per stock (lines #63-70) and calculate a multi-period compound return (line #72) as if we were trading that stock solely using our model.

For instance, for stock $i=1$ (AXP) from our portfolio, our code displays 1-year price-series:
post05082014-fig01 where days meeting our trigger criteria have been denoted by open-circle markers. If you now re-run the backtest making a gentle substitution in line #24 now to be:

24
    for i=1:1

we can find that by running through some extra lines of code as defined:

81
82
83
84
figure(2)
stem((0:20),100*results)
xlabel('Holding Period [days]');
ylabel('AXP: Compound Return [%]');

we obtain an appealing result:
post05082014-fig02
The chart reveals that for AXP, over past 251 days (since Aug/4 2014 backwards), we had 16 triggers therefore 16 trades and, surprisingly, regardless of holding period, the compound return from all closed trades was highly positive (profitable).

This is not the case if we consider, for example, $i=4$, IBM stock:
post05082014-fig03 This result points that for different holding periods (and different stocks of course) certain extra trading indicators should be applied to limit the losses (e.g. profit targets).

If we traded a whole portfolio using our gap-on-open model, we would end up with very encouraging result:
post05082014-fig04 where for each holding period we displayed the averaged over 10 stocks compound return. Taking into account the global up-trend in the US stock markets between August of 2013 and 2014, this strategy is worth its consideration with any further modifications (e.g. considering short or both long and short triggers, FX time-series, etc.).

Someone wise once said: Sometimes you win. Sometimes you learn. In algo trading we all learn to win.

In next post…
Marginal Value-at-Risk for Portfolio Managers

WANT TO LEARN MORE ON PORTFOLIOS in MATLAB!?
Click here!

 

Ideal Stock Trading Model for the Purpose of Backtesting Only


There is only one goal in algorithmic trading: to win the game. To emerge victorious. To feel the sunlight again after the months of walking through the valley of shadows being stuck in the depths of algo drawdowns. An endless quest for the most brilliant minds, to find the way to win over and over, and over again. To discover a perfect solution as your new private weapon in this battleground. Is it possible? If the answer were so simple, we wouldn’t be so engaged in seeking for a golden mean.

However, algo trading is a journey, and sometimes in the process of programming and testing of our trading systems, we need to have an ideal trading model ready-to-use straight away! What I mean by the ideal model is a sort of template, a benchmark that will allow us to list a number of successful trades, their starting and closing times, open and close price of the trades being executed, and the realized return coming down to our pocket from each trade. Such a trading model template also allows us to look at the trading data from a different perspective and re-consider and/or apply an additional risk management framework. In fact, the benefits are limitless for the backtesting purposes.

In this post we will explore one of the easiest ways in programming a perfect model by re-directing the time-axis backwards. Using an example of the data of a Google, Inc. (GOOG) stock listed at NASDAQ, we will analyse the stock trading history and find all possible trades returning at least 3% over past decade.

The results of this strategy I will use within the upcoming (this week!) series of new posts on Portfolio Optimization in Matlab for Algo Traders.

Model and Data

Let’s imagine we are interested in finding a large number of trades with the expected return from each trade to be at least 3%. We consider GOOG stock (daily close prices) with data spanning 365$\times$10 days back in time since the present day (last 10 years). We will make use of Google Finance data powered by Quandl as described in one of my previous posts, namely, Create a Portfolio of Stocks based on Google Finance Data fed by Quandl. Shall we begin?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
% Ideal Stock Trading Model for the Purpose of Backtesting Only
%
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
 
 
clear all; close all; clc;
 
stocklist={'GOOG'};
 
% Read in the list of tickers and internal code from Quandl.com
[ndata, text, alldata] = xlsread('QuandlStockCodeListUS.xlsx');
quandlc=text(:,1); % again, as a list in a cell array
quandlcode=text(:,3) % corresponding Quandl's Price Code
 
% fetch stock data for last 10 years
date2=datestr(today,'yyyy-mm-dd')     % from
date1=datestr(today-365*10,'yyyy-mm-dd') % to
 
stockd={};
for i=1:length(stocklist)
    for j=1:length(quandlc)
        if(strcmp(stocklist{i},quandlc{j}))
            fprintf('%4.0f %s\n',i,quandlc{j});
            % fetch the data of the stock from Quandl
            % using recommanded Quandl's command and
            % saving them directly into FTS object (fts)
            fts=0;
            [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ...
                          'authcode','ENTER_YOUR_CODE',...
                          'start_date',date1,'end_date',date2);
            stockd{i}=fts; % entire FTS object in an array's cell
        end
    end
end

The extracted data of GOOG from Google Finance via Quandl we can visualize immediately as follows:

36
37
38
39
40
41
42
% plot the close prices of GOOG
cp=fts2mat(stockd{1}.Close,1);
plot(cp(:,1),cp(:,2),'color',[0.6 0.6 0.6])
xlim([min(cp(:,1)) max(cp(:,1))]);
ylim([min(cp(:,2)) max(cp(:,2))]);
xlabel('Nov 2003 - Nov 2013 (days)');
ylabel('GOOG Close Price ($)');

ideal-fig01

We open the trade on the first day (long position) and as time goes by, on the following day we check whether the stock value increased by 3% or more. If not, we increase the current time position by one day and check again. If the condition is met, we close the opened trade on the following trading (business) day at the stock’s market price. We allow to open a new trade at the stock’s market price one next business day after the day when the previous trade has been closed (exercised). Additionally, we check if the ‘running’ return from the open trade exceeds -5%. If so, we substitute the open time of a new trade with the time when the latter condition has been fulfilled (the same for the open price of that trade). The latter strategy allow us to restart the backtesting process therefore the search for a new profitable trade.

44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
t0=cp(1,1);    % starting day for backtesting
tN=cp(end,1);  % last day
 
trades=[]; % open a log-book for all executed traded
status=0; % status meaning: 0-no open trade, 1-open trade
t=t0;
% we loop over time (t) [days]
while(t<tN)
    [r,~,~]=find(t==cp(:,1)); % check the row in cp vector
    if(~isempty(r))
       if(~status)
           topen=t; % time when we open the trade
           popen=cp(r,2); % assuming market price of the stock
           status=1;
       else
           ptmp=cp(r,2); % running close price
           rtmp=ptmp/popen-1; % running return of the open trade
           if(rtmp>0.03) % check 3% profit condition
               % if met, then
               tclose=busdate(t,1); % close time of the trade
                                    %  assumed on the next business day
               t=busdate(tclose,1); % next day in the loop
               if(tclose<=tN)
                   [r,~,~]=find(tclose==cp(:,1));
                   pclose=cp(r,2); % close price
                   ret=pclose/popen-1; % realized profit/loss
                   % save the trade details into log-book
                   trades=[trades; topen tclose popen pclose ret*100];
                   status=0; % change status of trading to not-open
                   % mark the opening of the trade as blue dot marker
                   hold on; plot(topen,popen,'b.','markerfacecolor','b');
                   % mark the end time of the trade
                   hold on; plot(tclose,pclose,'r.','markerfacecolor','r');
               end
           elseif(rtmp<=-0.05) % check an additional condition
               topen=t; % overwrite the time
               popen=cp(r,2); % and the price
               status=1; % sustain the status of the trade as 'open'
           else
               t=t+1;
           end
       end
    else
        t=t+1;
    end
end

In this piece of code, in the variable matrix of trades (a log-book of all exercised trades) we store the history of all successful trades meeting our earlier assumed criteria. The only uncertainty that we allow to slip into our perfect solution is the one related to an instance when the the close price on the next business day occurs to be lower, generating the realized profit from the trade less than 3%. By plotting all good trades with the ending day of $tN$ set as for Nov 18, 2013, we get a messy picture:

ideal-fig02

which translates into more intuitive one once we examine the distribution of profits from all trades:

figure(3);
hist(trades(:,5),50);
xlabel('Profit/loss (%)');
ylabel('Number of trades');

ideal-fig03

In this point the most valuable information is contained in the log-book which content we can analyze trade by trade:

>> format shortg
>> trades
 
trades =
 
   7.3218e+05   7.3218e+05       100.34        109.4       9.0293
   7.3218e+05   7.3220e+05       104.87          112       6.7989
   7.3221e+05   7.3221e+05       113.97       119.36       4.7293
   7.3221e+05   7.3222e+05       117.84       131.08       11.236
   7.3222e+05   7.3222e+05        129.6       138.37        6.767
   7.3223e+05   7.3224e+05       137.08       144.11       5.1284
   7.3224e+05   7.3224e+05       140.49       172.43       22.735
   7.3224e+05   7.3225e+05        187.4       190.64       1.7289
   ...
   7.3533e+05   7.3535e+05       783.05       813.45       3.8823
   7.3535e+05   7.3536e+05        809.1       861.55       6.4825
   7.3536e+05   7.3537e+05       857.23       915.89        6.843
   7.3546e+05   7.3549e+05       856.91       888.67       3.7063
   7.3549e+05   7.3553e+05       896.19       1003.3       11.952

where the columns correspond to the open and close time of the trade (a continuous Matlab’s time measure for the financial time-series; see datestr command for getting yyyy-mm-dd date format), open and close price of GOOG stock, and realized profit/loss of the trade, respectively.

Questions? Discuss on Forum.

Just dive directly into Backtesting section on QaR Forum and keep up, never give up.

Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting


Within the evolution of Mathworks’ MATLAB programming environment, finally, in the most recent version labelled 2013a we received a longly awaited line-command facilitation for pulling stock data directly from the Yahoo! servers. What does that mean for quants and algo traders? Honestly, a lot. Now, simply writing a few commands we can have nearly all what we want. However, please keep in mind that Yahoo! data are free therefore not always in one hundred percent their precision remains at the level of the same quality as, e.g. downloaded from Bloomberg resources. Anyway, just for pure backtesting of your models, this step introduces a big leap in dealing with daily stock data. As usual, we have a possibility of getting open, high, low, close, adjusted close prices of stocks supplemented with traded volume and the dates plus values of dividends.

In this post I present a short example how one can retrieve the data of SPY (tracking the performance of S&P500 index) using Yahoo! data in a new Matlab 2013a and I show a simple code how one can test the time period of buying-holding-and-selling SPY (or any other stock paying dividends) to make a profit every time.

The beauty of Yahoo! new feature in Matlab 2013a has been fully described in the official article of Request data from Yahoo! data servers where you can find all details required to build the code into your Matlab programs.

Model for Dividends

It is a well known opinion (based on many years of market observations) that one may expect the drop of stock price within a short timeframe (e.g. a few days) after the day when the stock’s dividends have been announced. And probably every quant, sooner or later, is tempted to verify that hypothesis. It’s your homework. However, today, let’s look at a bit differently defined problem based on the omni-working reversed rule: what goes down, must go up. Let’s consider an exchange traded fund of SPDR S&P 500 ETF Trust labelled in NYSE as SPY.

First, let’s pull out the Yahoo! data of adjusted Close prices of SPY from Jan 1, 2009 up to Aug 27, 2013

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
% Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
 
close all; clear all; clc;
 
date_from=datenum('Jan 1 2009');
date_to=datenum('Aug 27 2013');
 
stock='SPY';
 
adjClose = fetch(yahoo,stock,'adj close',date_from,date_to);
div = fetch(yahoo,stock,date_from,date_to,'v')
returns=(adjClose(2:end,2)./adjClose(1:end-1,2)-1);
 
% plot adjusted Close price of  and mark days when dividends
% have been announced
plot(adjClose(:,1),adjClose(:,2),'color',[0.6 0.6 0.6])
hold on;
plot(div(:,1),min(adjClose(:,2))+10,'ob');
ylabel('SPY (US$)');
xlabel('Jan 1 2009 to Aug 27 2013');

and visualize them:

spy-1

Having the data ready for backtesting, let’s look for the most profitable period of time of buying-holding-and-selling SPY assuming that we buy SPY one day after the dividends have been announced (at the market price), and we hold for $dt$ days (here, tested to be between 1 and 40 trading days).

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% find the most profitable period of holding SPY (long position)
neg=[];
for dt=1:40
 
buy=[]; sell=[];
for i=1:size(div,1)
    % find the dates when the dividends have been announced
    [r,c,v]=find(adjClose(:,1)==div(i,1));
    % mark the corresponding SPY price with blue circle marker
    hold on; plot(adjClose(r,1),adjClose(r,2),'ob');
    % assume you buy long SPY next day at the market price (close price)
    buy=[buy; adjClose(r-1,1) adjClose(r-1,2)];
    % assume you sell SPY in 'dt' days after you bought SPY at the market
    % price (close price)
    sell=[sell; adjClose(r-1-dt,1) adjClose(r-1-dt,2)];
end
 
% calculate profit-and-loss of each trade (excluding transaction costs)
PnL=sell(:,2)./buy(:,2)-1;
% summarize the results
neg=[neg; dt sum(PnL<0) sum(PnL<0)/length(PnL)];
 
end

If we now sort the results according to the percentage of negative returns (column 3 of neg matrix), we will be able to get:

>> sortrows(neg,3)
 
ans =
   18.0000    2.0000    0.1111
   17.0000    3.0000    0.1667
   19.0000    3.0000    0.1667
   24.0000    3.0000    0.1667
    9.0000    4.0000    0.2222
   14.0000    4.0000    0.2222
   20.0000    4.0000    0.2222
   21.0000    4.0000    0.2222
   23.0000    4.0000    0.2222
   25.0000    4.0000    0.2222
   28.0000    4.0000    0.2222
   29.0000    4.0000    0.2222
   13.0000    5.0000    0.2778
   15.0000    5.0000    0.2778
   16.0000    5.0000    0.2778
   22.0000    5.0000    0.2778
   27.0000    5.0000    0.2778
   30.0000    5.0000    0.2778
   31.0000    5.0000    0.2778
   33.0000    5.0000    0.2778
   34.0000    5.0000    0.2778
   35.0000    5.0000    0.2778
   36.0000    5.0000    0.2778
    6.0000    6.0000    0.3333
    8.0000    6.0000    0.3333
   10.0000    6.0000    0.3333
   11.0000    6.0000    0.3333
   12.0000    6.0000    0.3333
   26.0000    6.0000    0.3333
   32.0000    6.0000    0.3333
   37.0000    6.0000    0.3333
   38.0000    6.0000    0.3333
   39.0000    6.0000    0.3333
   40.0000    6.0000    0.3333
    5.0000    7.0000    0.3889
    7.0000    7.0000    0.3889
    1.0000    9.0000    0.5000
    2.0000    9.0000    0.5000
    3.0000    9.0000    0.5000
    4.0000    9.0000    0.5000

what simply indicates at the most optimal period of holding the long position in SPY equal 18 days. We can mark all trades (18 day holding period) in the chart:

spy-2

where the trade open and close prices (according to our model described above) have been marked in the plot by black and red circle markers, respectively. Only 2 out of 18 trades (PnL matrix) occurred to be negative with the loss of 2.63% and 4.26%. The complete distribution of profit and losses from all trades can be obtained in the following way:

47
48
49
50
figure(2);
hist(PnL*100,length(PnL))
ylabel('Number of trades')
xlabel('Return (%)')

returning

spy-3

Let’s make some money!

The above Matlab code delivers a simple application of the newest build-in connectivity with Yahoo! server and the ability to download the stock data of our interest. We have tested the optimal holding period for SPY since the beginning of 2009 till now (global uptrend). The same code can be easily used and/or modified for verification of any period and any stock for which the dividends had been released in the past. Fairly simple approach, though not too frequent in trading, provides us with some extra idea how we can beat the market assuming that the future is going to be/remain more or less the same as the past. So, let’s make some money!

Simulation of Portfolio Value using Geometric Brownian Motion Model


Having in mind the upcoming series of articles on building a backtesting engine for algo traded portfolios, today I decided to drop a short post on a simulation of the portfolio realised profit and loss (P&L). In the future I will use some results obtained below for a discussion of key statistics used in the evaluation of P&L at any time when it is required by the portfolio manager.

Assuming that we trade a portfolio of any assets, its P&L can be simulated in a number of ways. One of the quickest method is the application of geometric brownian motion (GBM) model with a drift in time of $\mu_t$ and the process standard deviation of $\sigma_t$ over its total time interval. The model takes its form as follows:
$$
dS_t = \mu_t S_t dt + \sigma_t S_t dz
$$ where $dz\sim N(0,dt)$ and the process has variance equal to $dt$ (the process is brownian). Let $t$ is the present time and the portfolio has an initial value of $S_t$ dollars. The target time is $T$ therefore portfolio time horizon of evaluation is $\tau=T-t$ at $N$ time steps. Since the GBM model assumes no correlations between the values of portfolio on two consecutive days (in general, over time), by integrating $dS/S$ over finite interval we get a discrete change of portfolio value:
$$
\Delta S_t = S_{t-1} (\mu_t\Delta t + \sigma_t\epsilon \sqrt{\Delta t}) \ .
$$ For simplicity, one can assume that both parameters of the model, $\mu_t$ and $\sigma_t$ are constant over time, and the random variable $\epsilon\sim N(0,1)$. In order to simulate the path of portfolio value, we go through $N$ iterations following the formula:
$$
S_{t+1} = S_t + S_t(\mu_t\Delta t + \sigma_t \epsilon_t \sqrt{\Delta t})
$$ where $\Delta t$ denotes a local volatility defined as $\sigma_t/\sqrt{N}$ and $t=1,…,N$.

Example

Let’s assume that initial portfolio value is $S_1=\$10,000$ and it is being traded over 252 days. We allow the underlying process to have a drift of $\mu_t=0.05$ and the overall volatility of $\sigma_t=5%$ constant over time. Since the simulation in every of 252 steps depends on $\epsilon$ drawn from the normal distribution $N(0,1)$, we can obtain any number of possible realisations of the simulated portfolio value path.

Coding quickly the above model in Matlab,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
mu=0.05;      % drift
sigma=0.05;   % std dev over total inverval
S1=10000;     % initial capital ($)
N=252;        % number of time steps (trading days)
K=1;          % nubber of simulations
 
dt=sigma/sqrt(N);   % local volatility
 
St=[S1];
for k=1:K
    St=[S1];
    for i=2:N
        eps=randn; 
        S=St(i-1)+St(i-1)*(mu*dt+sigma*eps*sqrt(dt));
        St=[St; S];
    end
    hold on; plot(St);
end
xlim([1 N]);
xlabel('Trading Days')
ylabel('Simulated Portfolio Value ($)');

lead us to one of the possible process realisations,

Simulated Portfolio Value

quite not too bad, with the annual rate of return of about 13%.

The visual inspection of the path suggests no major drawbacks and low volatility, therefore pretty good trading decisions made by portfolio manager or trading algorithm. Of course, the simulation does not tell us anything about the model, the strategy involved, etc. but the result we obtained is sufficient for further discussion on portfolio key metrics and VaR evolution.

Slippage in Model Backtesting


A precious lesson I learned during my venture over programming an independent backtesting engine for new trading model was slippage. Simply speaking, slippage is a fraction of stock price which you need to assume as a deviation from the price you are willing to pay. In model backtesting the slippage is extremely important. Why? Let’s imagine your model generates a signal to buy or sell a stock on a day $t_i$, i.e. after when the market has been closed and your stock trading history has been updated with a stock close price. Since you can’t buy/sell this stock on day $t_i$, your algo-trading system in connection to your model rules places a new order to be executed on day $t_{i+1}$. Regardless of the position the stock holds, you don’t know the price on the following day at the opening of the market. Well, in real-time trading – yes. However, in the backtesting of your model this information is available, e.g. your have historical stock prices of IBM in Aug 2008, so you know the future.

Now, you may wish to program your backtesting engine to buy/sell this stock for you on $t_{i+1}$ day at the open, mid-day, intra-day, or even close price. The choice is yours. There are different strategies. Close price is good option for consideration as long as you have also an track of intra-day trading on $t_{i+1}$, therefore you have time to analyze the intra-day variability, take extra correction for extreme volatility or black-swans, and proceed with your order with extra caution. But if you program a simple approach in your order execution (e.g. buy at open price) you assume some risk of the price not to be in your favour.

Quite conservative approach to compensate for systematic unexpected slippages in the stock price when your order has been sent to the broker is to assume in simulations (backtesting) a fixed slippage working against your profits every time. Namely, you don’t buy/sell your stock at the price as given on $t_{i+1}$ day in your historical price table. You assume the slippage of $\Delta S$. If the price of the stock is $P$ your slippage affects the price:
$$
P’ = P \pm (P\times \Delta S)
$$ where $P’$ is the executed price for your simulated order. The sign $\pm$ has double meaning. To allow you to understand it, let me draw two basic rules of the slippage in backtesting:

slippage-long

If your trading decision is to go long you always buy at the price higher by $P\times \Delta S$ than P and you sell the stock at the price lower than $P$, again by $P\times \Delta S$. Reversely, if you open a short position, you buy lower and sell higher when closing the same position.

The amount of slippage you should assume varies depending on the different conditions. If you are involved in lots of algorithmic trading operations, you probably are able to estimate your slippage. In general, the simulated slippage shouldn’t be more than 2%.

If you forget to include the slippage in your backtesting black-box, it may occur that your model is extremely profitable and you risk a lot in practice. On the other hand, adding slippage to your test may make your day less bright as it has started. But don’t worry. Keep smiling as a new day is a new opportunity, and life is not about avoiding the risks but managing them right.

Contact Form Powered By : XYZScripts.com