Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting

Within the evolution of Mathworks’ MATLAB programming environment, finally, in the most recent version labelled 2013a we received a longly awaited line-command facilitation for pulling stock data directly from the Yahoo! servers. What does that mean for quants and algo traders? Honestly, a lot. Now, simply writing a few commands we can have nearly all what we want. However, please keep in mind that Yahoo! data are free therefore not always in one hundred percent their precision remains at the level of the same quality as, e.g. downloaded from Bloomberg resources. Anyway, just for pure backtesting of your models, this step introduces a big leap in dealing with daily stock data. As usual, we have a possibility of getting open, high, low, close, adjusted close prices of stocks supplemented with traded volume and the dates plus values of dividends.

In this post I present a short example how one can retrieve the data of SPY (tracking the performance of S&P500 index) using Yahoo! data in a new Matlab 2013a and I show a simple code how one can test the time period of buying-holding-and-selling SPY (or any other stock paying dividends) to make a profit every time.

The beauty of Yahoo! new feature in Matlab 2013a has been fully described in the official article of Request data from Yahoo! data servers where you can find all details required to build the code into your Matlab programs.

Model for Dividends

It is a well known opinion (based on many years of market observations) that one may expect the drop of stock price within a short timeframe (e.g. a few days) after the day when the stock’s dividends have been announced. And probably every quant, sooner or later, is tempted to verify that hypothesis. It’s your homework. However, today, let’s look at a bit differently defined problem based on the omni-working reversed rule: what goes down, must go up. Let’s consider an exchange traded fund of SPDR S&P 500 ETF Trust labelled in NYSE as SPY.

First, let’s pull out the Yahoo! data of adjusted Close prices of SPY from Jan 1, 2009 up to Aug 27, 2013

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 % Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz   close all; clear all; clc;   date_from=datenum('Jan 1 2009'); date_to=datenum('Aug 27 2013');   stock='SPY';   adjClose = fetch(yahoo,stock,'adj close',date_from,date_to); div = fetch(yahoo,stock,date_from,date_to,'v') returns=(adjClose(2:end,2)./adjClose(1:end-1,2)-1);   % plot adjusted Close price of and mark days when dividends % have been announced plot(adjClose(:,1),adjClose(:,2),'color',[0.6 0.6 0.6]) hold on; plot(div(:,1),min(adjClose(:,2))+10,'ob'); ylabel('SPY (US$)'); xlabel('Jan 1 2009 to Aug 27 2013'); and visualize them: Having the data ready for backtesting, let’s look for the most profitable period of time of buying-holding-and-selling SPY assuming that we buy SPY one day after the dividends have been announced (at the market price), and we hold for$dt$days (here, tested to be between 1 and 40 trading days). 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 % find the most profitable period of holding SPY (long position) neg=[]; for dt=1:40 buy=[]; sell=[]; for i=1:size(div,1) % find the dates when the dividends have been announced [r,c,v]=find(adjClose(:,1)==div(i,1)); % mark the corresponding SPY price with blue circle marker hold on; plot(adjClose(r,1),adjClose(r,2),'ob'); % assume you buy long SPY next day at the market price (close price) buy=[buy; adjClose(r-1,1) adjClose(r-1,2)]; % assume you sell SPY in 'dt' days after you bought SPY at the market % price (close price) sell=[sell; adjClose(r-1-dt,1) adjClose(r-1-dt,2)]; end % calculate profit-and-loss of each trade (excluding transaction costs) PnL=sell(:,2)./buy(:,2)-1; % summarize the results neg=[neg; dt sum(PnL<0) sum(PnL<0)/length(PnL)]; end If we now sort the results according to the percentage of negative returns (column 3 of neg matrix), we will be able to get: >> sortrows(neg,3) ans = 18.0000 2.0000 0.1111 17.0000 3.0000 0.1667 19.0000 3.0000 0.1667 24.0000 3.0000 0.1667 9.0000 4.0000 0.2222 14.0000 4.0000 0.2222 20.0000 4.0000 0.2222 21.0000 4.0000 0.2222 23.0000 4.0000 0.2222 25.0000 4.0000 0.2222 28.0000 4.0000 0.2222 29.0000 4.0000 0.2222 13.0000 5.0000 0.2778 15.0000 5.0000 0.2778 16.0000 5.0000 0.2778 22.0000 5.0000 0.2778 27.0000 5.0000 0.2778 30.0000 5.0000 0.2778 31.0000 5.0000 0.2778 33.0000 5.0000 0.2778 34.0000 5.0000 0.2778 35.0000 5.0000 0.2778 36.0000 5.0000 0.2778 6.0000 6.0000 0.3333 8.0000 6.0000 0.3333 10.0000 6.0000 0.3333 11.0000 6.0000 0.3333 12.0000 6.0000 0.3333 26.0000 6.0000 0.3333 32.0000 6.0000 0.3333 37.0000 6.0000 0.3333 38.0000 6.0000 0.3333 39.0000 6.0000 0.3333 40.0000 6.0000 0.3333 5.0000 7.0000 0.3889 7.0000 7.0000 0.3889 1.0000 9.0000 0.5000 2.0000 9.0000 0.5000 3.0000 9.0000 0.5000 4.0000 9.0000 0.5000 what simply indicates at the most optimal period of holding the long position in SPY equal 18 days. We can mark all trades (18 day holding period) in the chart: where the trade open and close prices (according to our model described above) have been marked in the plot by black and red circle markers, respectively. Only 2 out of 18 trades (PnL matrix) occurred to be negative with the loss of 2.63% and 4.26%. The complete distribution of profit and losses from all trades can be obtained in the following way: 47 48 49 50 figure(2); hist(PnL*100,length(PnL)) ylabel('Number of trades') xlabel('Return (%)') returning Let’s make some money! The above Matlab code delivers a simple application of the newest build-in connectivity with Yahoo! server and the ability to download the stock data of our interest. We have tested the optimal holding period for SPY since the beginning of 2009 till now (global uptrend). The same code can be easily used and/or modified for verification of any period and any stock for which the dividends had been released in the past. Fairly simple approach, though not too frequent in trading, provides us with some extra idea how we can beat the market assuming that the future is going to be/remain more or less the same as the past. So, let’s make some money! ## Trend Identification for FX Traders When you think about an invention of a new model for algorithmic trading, there are only three key elements you need to start your work with: creativity, data, and programming tool. Assuming that the last two are already in your possession, all what remains is seeking and finding a great new idea! With no offense, that’s the hardest part of the game. To be successful in discovering new trading solutions you have to be completely open-minded, relaxed and full of spatial orientation with the information pertaining to your topic. Personally, after many years of programming and playing with the digital signal processing techniques, I have discovered that the most essential aspect of well grounded research is data itself. The more, literally, I starred at time-series changing their properties, the more I was able to capture subtle differences, often overlooked by myself before, and with the aid of intuition and scientific experience some new ideas simply popped up. Here I would like to share with you a part of this process. In Extracting Time-Series from Tick-Data article I outlined one of many possible ways of the FX time-series extraction from the very fine data sets. As a final product we have obtained two files, namely: audusd.bid.1h audusd.ask.1h corresponding to Bid and Ask prices for Forex AUDUSD pair’s trading history between Jan 2000 and May 2010. Each file contained two columns of numbers: Time (Modified Julian Day) and Price. The time resolution has been selected to be 1 hour. FOREX trading lasts from Monday to Friday, continuously for 24 hours. Therefore the data contain regular gaps corresponding to weekends. As the data coverage is more abundant comparing to, for example, much shorter trading windows of equities or ETFs around the world, that provides us with a better understanding of trading directions within every week time frame. Keeping that in mind, we might be interested in looking at directional information conveyed by the data as a seed of a potential new FX model. As for now, let’s solely focus on initial pre-processing of Bid and Ask time-series and splitting each week into a common cell array. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 % FX time-series analysis % (c) Quant at Risk, 2012 % % Part 1: Separation of the weeks close all; clear all; clc; % --analyzed FX pair pair=['audusd']; % --data n=['./',pair,'/',pair]; % a common path to files na=[n,'.ask.1h']; nb=[n,'.bid.1h']; d1=load(na); d2=load(na); % loading data d=(d1+d2)/2; % blending clear d1 d2 For a sake of simplicity, in line 16, we decided to use a simple average of Bid and Ask 1-hour prices for our further research. Next, we create a weekly template,$x$, for our data classification, and we find the total number of weeks available for analysis: 19 20 21 22 23 24 25 26 27 28 29 30 31 % time constraints from the data t0=min(d(:,1)); tN=max(d(:,1)); t1=t0-1; % weekly template for data classification x=t1:7:tN+7; % total number of weeks nw=length(x)-1; fprintf(upper(pair)); fprintf(' time-series: %3.0f weeks (%5.2f yrs)\n',nw,nw/52); what in our case returns a positive information: AUDUSD time-series: 539 weeks (10.37 yrs) The core of programming exercise is to split all 539 weeks and save them into a cell array of$week$. As we will see in the code section below, for some reasons we may want to assure ourselves that each week will contain the same number of points, therefore any missing data from our FX data provider will be interpolated. To do that efficiently, we use the following function which makes use of Piecewise Cubic Hermite Interpolating Polynomial interpolation for filling gapped data point in the series: function [x2,y2]=gapinterpol(x,y,dt); % specify axis x_min=x(1); x_max=x(length(x)); x2=(x_min:dt:x_max); % inperpolate gaps y2=pchip(x,y,x2); end The separation of weeks we realize in our program by: 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 week={}; % an empty cell array avdt=[]; for i=1:nw % split FX signal according to week [r,c,v]=find(d(:,1)>x(i) & d(:,1)<x(i+1)); x1=d(r,1); y1=d(r,2); % interpolate gaps, use 1-hour bins dt=1/24; [x2,y2]=gapinterpol(x1,y1,dt); % check the average sampling time, should equal to dt s=0; for j=1:length(x2)-1 s=s+(x2(j+1)-x2(j)); end tmp=s/(length(x2)-1); avdt=[avdt; tmp]; % store the week signal in a cell array tmp=[x2; y2]; tmp=tmp'; week{i}=tmp; end fprintf('average sampling after interpolation = %10.7f [d]\n',max(avdt)); where as a check-up we get: average sampling after interpolation = 0.0416667 [d] what corresponds to the expected value of$1/24\$ day with a sufficient approximation.

A quick visual verification of our signal processing,

54 55 56 57 58 59 60 61 62 63 scrsz = get(0,'ScreenSize'); h=figure('Position',[70 scrsz(4)/2 scrsz(3)/1.1 scrsz(4)/2],'Toolbar','none'); hold off; for i=1:nw w=week{i}; x=w(:,1); y=w(:,2); % plot weekly signal hold on; plot(x,y,'k'); end xlim([0 100]);

uncovers our desired result: