Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting


Within the evolution of Mathworks’ MATLAB programming environment, finally, in the most recent version labelled 2013a we received a longly awaited line-command facilitation for pulling stock data directly from the Yahoo! servers. What does that mean for quants and algo traders? Honestly, a lot. Now, simply writing a few commands we can have nearly all what we want. However, please keep in mind that Yahoo! data are free therefore not always in one hundred percent their precision remains at the level of the same quality as, e.g. downloaded from Bloomberg resources. Anyway, just for pure backtesting of your models, this step introduces a big leap in dealing with daily stock data. As usual, we have a possibility of getting open, high, low, close, adjusted close prices of stocks supplemented with traded volume and the dates plus values of dividends.

In this post I present a short example how one can retrieve the data of SPY (tracking the performance of S&P500 index) using Yahoo! data in a new Matlab 2013a and I show a simple code how one can test the time period of buying-holding-and-selling SPY (or any other stock paying dividends) to make a profit every time.

The beauty of Yahoo! new feature in Matlab 2013a has been fully described in the official article of Request data from Yahoo! data servers where you can find all details required to build the code into your Matlab programs.

Model for Dividends

It is a well known opinion (based on many years of market observations) that one may expect the drop of stock price within a short timeframe (e.g. a few days) after the day when the stock’s dividends have been announced. And probably every quant, sooner or later, is tempted to verify that hypothesis. It’s your homework. However, today, let’s look at a bit differently defined problem based on the omni-working reversed rule: what goes down, must go up. Let’s consider an exchange traded fund of SPDR S&P 500 ETF Trust labelled in NYSE as SPY.

First, let’s pull out the Yahoo! data of adjusted Close prices of SPY from Jan 1, 2009 up to Aug 27, 2013

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
% Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
 
close all; clear all; clc;
 
date_from=datenum('Jan 1 2009');
date_to=datenum('Aug 27 2013');
 
stock='SPY';
 
adjClose = fetch(yahoo,stock,'adj close',date_from,date_to);
div = fetch(yahoo,stock,date_from,date_to,'v')
returns=(adjClose(2:end,2)./adjClose(1:end-1,2)-1);
 
% plot adjusted Close price of  and mark days when dividends
% have been announced
plot(adjClose(:,1),adjClose(:,2),'color',[0.6 0.6 0.6])
hold on;
plot(div(:,1),min(adjClose(:,2))+10,'ob');
ylabel('SPY (US$)');
xlabel('Jan 1 2009 to Aug 27 2013');

and visualize them:

spy-1

Having the data ready for backtesting, let’s look for the most profitable period of time of buying-holding-and-selling SPY assuming that we buy SPY one day after the dividends have been announced (at the market price), and we hold for $dt$ days (here, tested to be between 1 and 40 trading days).

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% find the most profitable period of holding SPY (long position)
neg=[];
for dt=1:40
 
buy=[]; sell=[];
for i=1:size(div,1)
    % find the dates when the dividends have been announced
    [r,c,v]=find(adjClose(:,1)==div(i,1));
    % mark the corresponding SPY price with blue circle marker
    hold on; plot(adjClose(r,1),adjClose(r,2),'ob');
    % assume you buy long SPY next day at the market price (close price)
    buy=[buy; adjClose(r-1,1) adjClose(r-1,2)];
    % assume you sell SPY in 'dt' days after you bought SPY at the market
    % price (close price)
    sell=[sell; adjClose(r-1-dt,1) adjClose(r-1-dt,2)];
end
 
% calculate profit-and-loss of each trade (excluding transaction costs)
PnL=sell(:,2)./buy(:,2)-1;
% summarize the results
neg=[neg; dt sum(PnL<0) sum(PnL<0)/length(PnL)];
 
end

If we now sort the results according to the percentage of negative returns (column 3 of neg matrix), we will be able to get:

>> sortrows(neg,3)
 
ans =
   18.0000    2.0000    0.1111
   17.0000    3.0000    0.1667
   19.0000    3.0000    0.1667
   24.0000    3.0000    0.1667
    9.0000    4.0000    0.2222
   14.0000    4.0000    0.2222
   20.0000    4.0000    0.2222
   21.0000    4.0000    0.2222
   23.0000    4.0000    0.2222
   25.0000    4.0000    0.2222
   28.0000    4.0000    0.2222
   29.0000    4.0000    0.2222
   13.0000    5.0000    0.2778
   15.0000    5.0000    0.2778
   16.0000    5.0000    0.2778
   22.0000    5.0000    0.2778
   27.0000    5.0000    0.2778
   30.0000    5.0000    0.2778
   31.0000    5.0000    0.2778
   33.0000    5.0000    0.2778
   34.0000    5.0000    0.2778
   35.0000    5.0000    0.2778
   36.0000    5.0000    0.2778
    6.0000    6.0000    0.3333
    8.0000    6.0000    0.3333
   10.0000    6.0000    0.3333
   11.0000    6.0000    0.3333
   12.0000    6.0000    0.3333
   26.0000    6.0000    0.3333
   32.0000    6.0000    0.3333
   37.0000    6.0000    0.3333
   38.0000    6.0000    0.3333
   39.0000    6.0000    0.3333
   40.0000    6.0000    0.3333
    5.0000    7.0000    0.3889
    7.0000    7.0000    0.3889
    1.0000    9.0000    0.5000
    2.0000    9.0000    0.5000
    3.0000    9.0000    0.5000
    4.0000    9.0000    0.5000

what simply indicates at the most optimal period of holding the long position in SPY equal 18 days. We can mark all trades (18 day holding period) in the chart:

spy-2

where the trade open and close prices (according to our model described above) have been marked in the plot by black and red circle markers, respectively. Only 2 out of 18 trades (PnL matrix) occurred to be negative with the loss of 2.63% and 4.26%. The complete distribution of profit and losses from all trades can be obtained in the following way:

47
48
49
50
figure(2);
hist(PnL*100,length(PnL))
ylabel('Number of trades')
xlabel('Return (%)')

returning

spy-3

Let’s make some money!

The above Matlab code delivers a simple application of the newest build-in connectivity with Yahoo! server and the ability to download the stock data of our interest. We have tested the optimal holding period for SPY since the beginning of 2009 till now (global uptrend). The same code can be easily used and/or modified for verification of any period and any stock for which the dividends had been released in the past. Fairly simple approach, though not too frequent in trading, provides us with some extra idea how we can beat the market assuming that the future is going to be/remain more or less the same as the past. So, let’s make some money!

Trend Identification for FX Traders


When you think about an invention of a new model for algorithmic trading, there are only three key elements you need to start your work with: creativity, data, and programming tool. Assuming that the last two are already in your possession, all what remains is seeking and finding a great new idea! With no offense, that’s the hardest part of the game.

To be successful in discovering new trading solutions you have to be completely open-minded, relaxed and full of spatial orientation with the information pertaining to your topic. Personally, after many years of programming and playing with the digital signal processing techniques, I have discovered that the most essential aspect of well grounded research is data itself. The more, literally, I starred at time-series changing their properties, the more I was able to capture subtle differences, often overlooked by myself before, and with the aid of intuition and scientific experience some new ideas simply popped up.

Here I would like to share with you a part of this process.

In Extracting Time-Series from Tick-Data article I outlined one of many possible ways of the FX time-series extraction from the very fine data sets. As a final product we have obtained two files, namely:

audusd.bid.1h
audusd.ask.1h

corresponding to Bid and Ask prices for Forex AUDUSD pair’s trading history between Jan 2000 and May 2010. Each file contained two columns of numbers: Time (Modified Julian Day) and Price. The time resolution has been selected to be 1 hour.

FOREX trading lasts from Monday to Friday, continuously for 24 hours. Therefore the data contain regular gaps corresponding to weekends. As the data coverage is more abundant comparing to, for example, much shorter trading windows of equities or ETFs around the world, that provides us with a better understanding of trading directions within every week time frame. Keeping that in mind, we might be interested in looking at directional information conveyed by the data as a seed of a potential new FX model.

As for now, let’s solely focus on initial pre-processing of Bid and Ask time-series and splitting each week into a common cell array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
% FX time-series analysis
% (c) Quant at Risk, 2012
%
% Part 1: Separation of the weeks
 
close all; clear all; clc;
 
% --analyzed FX pair
pair=['audusd'];
 
% --data
n=['./',pair,'/',pair];     % a common path to files
na=[n,'.ask.1h']; 
nb=[n,'.bid.1h'];
d1=load(na); d2=load(na);   % loading data
d=(d1+d2)/2;                % blending
clear d1 d2

For a sake of simplicity, in line 16, we decided to use a simple average of Bid and Ask 1-hour prices for our further research. Next, we create a weekly template, $x$, for our data classification, and we find the total number of weeks available for analysis:

19
20
21
22
23
24
25
26
27
28
29
30
31
% time constraints from the data
t0=min(d(:,1));
tN=max(d(:,1));
t1=t0-1;
 
% weekly template for data classification
x=t1:7:tN+7;
 
% total number of weeks
nw=length(x)-1;
 
fprintf(upper(pair));
fprintf(' time-series: %3.0f weeks (%5.2f yrs)\n',nw,nw/52);

what in our case returns a positive information:

AUDUSD time-series: 539 weeks (10.37 yrs)

The core of programming exercise is to split all 539 weeks and save them into a cell array of $week$. As we will see in the code section below, for some reasons we may want to assure ourselves that each week will contain the same number of points, therefore any missing data from our FX data provider will be interpolated. To do that efficiently, we use the following function which makes use of Piecewise Cubic Hermite Interpolating Polynomial interpolation for filling gapped data point in the series:

function [x2,y2]=gapinterpol(x,y,dt);
    % specify axis
    x_min=x(1);
    x_max=x(length(x));
    x2=(x_min:dt:x_max);
    % inperpolate gaps
    y2=pchip(x,y,x2);
end

The separation of weeks we realize in our program by:

33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
week={};  % an empty cell array
avdt=[]; 
 
for i=1:nw
    % split FX signal according to week
    [r,c,v]=find(d(:,1)>x(i) & d(:,1)<x(i+1));
    x1=d(r,1); y1=d(r,2);
    % interpolate gaps, use 1-hour bins
    dt=1/24;
    [x2,y2]=gapinterpol(x1,y1,dt);
    % check the average sampling time, should equal to dt
    s=0;
    for j=1:length(x2)-1
        s=s+(x2(j+1)-x2(j));
    end
    tmp=s/(length(x2)-1);
    avdt=[avdt; tmp];
    % store the week signal in a cell array
    tmp=[x2; y2]; tmp=tmp';
    week{i}=tmp;
end
fprintf('average sampling after interpolation = %10.7f [d]\n',max(avdt));

where as a check-up we get:

average sampling after interpolation =  0.0416667 [d]

what corresponds to the expected value of $1/24$ day with a sufficient approximation.

A quick visual verification of our signal processing,

54
55
56
57
58
59
60
61
62
63
scrsz = get(0,'ScreenSize');
h=figure('Position',[70 scrsz(4)/2 scrsz(3)/1.1 scrsz(4)/2],'Toolbar','none');
hold off;
for i=1:nw
    w=week{i};
    x=w(:,1); y=w(:,2);
    % plot weekly signal
    hold on; plot(x,y,'k');
end
xlim([0 100]);

uncovers our desired result:

AUD/USD

Contact Form Powered By : XYZScripts.com