Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Pre-Processing of Asset Price Series for Portfolio Optimization

Portfolio Optimization is a significant component of Matlab’s Financial Toolbox. It provides us with ready-to-use solution in finding optimal weights of assets that we consider for trading deriving them based on the historical asset performance. From a practical point of view, we can include it in our algorithmic trading strategy and backtest its applicability under different initial conditions. This is a subject of my next up-coming post. However, before we can enjoy the view from the peak, we need to climb the mountain first.

In Matlab, the portfolio is created as a dedicated object of the same name. It doesn’t read the raw stock data. We need to feed that beast. Two major ingredients satisfy the input: a vector of the expected asset returns and a covariance matrix. Matlab helps us to estimate these moments but first we need to deliver asset data in a digestable form.

In this post we will see how one can quickly download the stock data from the Internet based on our own stock selection and pre-process them for solving portfolio optimization problem in Matlab.

Initial Setup for Portfolio Object

Let’s say that at any point of time you have your own list of stocks you wish to buy. For simplicity let’s also assume that the list contains stocks traded on NYSE or NASDAQ. Since you have been a great fun of this game, now you are almost ready to buy what you jotted down on your ShoppingList.lst. Here, an example of 10 tech stocks:

AAPL   AOL   BIDU   GOOG   HPQ   IBM   INTC   MSFT   NVDA   TXN

They will constitute your portfolio of stocks. The problem of portfolio optimization requires a look back in time in the space of returns obtained in trading by each stock. Based on them the Return Proxy and Risk Proxy can be found.

The return matrix $R$ of dimensions $(N-1)\times M$ where $N$ stands for number of historical prices (e.g. derived daily, or monthly, etc.) and $M$ for the number of stocks in our portfolio, is required by Matlab as an input. We will see how does it work in next post. For now let’s solely focus on creation of this matrix.

In the article Create a Portfolio of Stocks based on Google Finance Data fed by Quandl I discussed Quandl.com as an attractive data provider for US stocks. Here, we will follow this solution making use of Quandl resources to pull out the stock price series for our shopping list. Ultimately, we aim at building a function, here: QuandlForPortfolio, that does the job for us:

% Pre-Processing of Asset Price Series for Portfolio Optimization in Matlab
%  (c) 2013, QuantAtRisk.com, by Pawel Lachowicz
 
clear all; close all; clc;
 
% Input Parameters
n=1*365;
tickers='ShoppingList.lst';
qcodes='QuandlStockCodeListUS.xlsx';
 
[X,Y,R,AssetList] = QuandlForPortfolio(n,tickers,qcodes);

We call this function with three input parameters. The first one, $n$, denotes a number of calendar days from today (counting backwards) for which we wish to retrieve the stock data. Usually, 365 days will correspond to about 250$-$252 trading days. The second parameter is a path/file name to our list of stock (desired to be taken into account in the portfolio optimisation process) while the last input defines the path/file name to the file storing stocks’ tickers and associated Quandl Price Codes (see here for more details).

Feeding the Beast

The QuandlForPortfolio Matlab function is an extended version of the previously discussed solution. It contains an important correcting procedure for the data fetched from the Quandl servers. First, let’s have a closer look on the function itself:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
% Function assists in fetching Google Finance data from the Quandl.com
%  server for a given list of tickers of stocks traded on NYSE or
%  NASDAQ. Data are retrieved for last 'n' days with daily sampling.
%
% INPUT
%   n       : number of calendar days from 'today' (e.g. 365 would
%             correspond to about 252 business days)
%   tickers : a path/file name of a text file listing tickers
%   qcodes  : a path/file name of Excel workbook (.xlsx) containing a list
%              of tickers and Quandl Price Codes in the format of
%              [Ticker,Stock Name,Price Code,Ratios Code,In Market?]
% OUTPUT
%   X0        : [Nx1] column vector with days
%   Y0        : [NxM] matrix with Close Prices for M stocks
%   R0        : [(N-1)xM] matrix of Retruns
%   AssetList : a list of tickers (cell array)
%
% (c) 2013, QuantAtRisk.com, by Pawel Lachowicz
 
function [X0,Y0,R0,AssetList0] = QuandlForPortfolio(n,tickers,qcodes)
    fileID = fopen(tickers);
    tmp = textscan(fileID, '%s');
    fclose(fileID);
    AssetList=tmp{1};  % a list as a cell array
 
    % Read in the list of tickers and internal Quandl codes
    %
    [~,text,~] = xlsread(qcodes);
    quandlc=text(:,1); % again, as a list in a cell array
    quandlcode=text(:,3); % corresponding Quandl's Price Code
 
    date1=datestr(today-n,'yyyy-mm-dd'); % from
    date2=datestr(today,'yyyy-mm-dd');   % to
 
    % Fetch the data from Quandl.com
    %
    QData={};
    for i=1:length(AssetList)
        for j=1:length(quandlc)
            if(strcmp(AssetList{i},quandlc{j}))
                fprintf('%4.0f %s\n',i,quandlc{j});
                fts=0;
                [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ...
                   'authcode','x',...
                   'start_date',date1,'end_date',date2,'collapse','daily');
                QData{i}=fts;
            end
        end
    end
 
    % Post-Processing of Fetched Data
    %
    % create a list of days across all tickers
    TMP=[];
    for i=1:length(QData)
        tmp=fts2mat(QData{i},1);
        tmp=tmp(:,1);
        TMP=[TMP; tmp];
    end
    ut=unique(TMP);
    % use that list to find these days that are not present
    %  among all data sets
    TMP=[];
    for i=1:length(QData)
        tmp=fts2mat(QData{i},1);
        tmp=tmp(:,1);
        TMP=[TMP; setdiff(ut,tmp)];
    end
    ut=unique(TMP);
    % finally, extract Close Prices from FTS object and store them
    %  in Y0 matrix, plus corresponding days in X0
    X0=[];
    Y0=[]; 
    for i=1:length(QData)
        tmp=fts2mat(QData{i},1);
        cp=[];
        for j=1:size(tmp,1)
            [r,~,~]=find(ut==tmp(j,1));
            if(isempty(r))
                cp=[cp; tmp(j,5)]; % column 5 corresponds to Close Price
                if(i<2)
                    % create a time column vector listing days
                    % common among all data sets
                    X0=[X0; tmp(j,1)];
                end
            end
        end
        Y0=[Y0 cp];
    end
    % transform Close Prices into Returns, R(i)=cp(i)/cp(i-1)-1
    R0=tick2ret(Y0);
    AssetList0=AssetList';
end

The main bottleneck comes from the fact that Matlab’s portfolio object demands an equal number of historical returns ($N-1$) in the matrix of $R$ for all $M$ assets. We design the function in the way that it sets the common timeframe for all stocks listed on our shopping list. Of course, we ensure that all stocks were traded in the markets for about $n$ last days (rough estimation).

Now, the timeframe of $n$ last days should be understood as a first approximation. We fetch the data from Quandl (numeric date, Open, High, Low, Close, Volume) and save them in the cell array QData (lines #37-49) for each stock separately as FTS objects (Financial Time-Series objects; see Financial Toolbox). However, it may occur that not every stock we fetched displays the same amount of data. That is why we need to investigate for what days and for what stocks we miss the data. We achieve that by scanning each FTS object and creating a unique list of all days for which we have data (lines #54-60).

Next, we loop again over the same data sets but now we compare that list with a list of all dates for each stock individually (lines #63-69), capturing (line #67) those dates that are missing. Their complete list is stored as a vector in line #69. Eventually, given that, we are able to compile the full data set (e.g. Close Prices; here line #80) for all stocks in our portfolio ensuring that we will include only those dates for which we have prices across all $M$ assets (lines #70-91).

Beast Unleashed

We test our data pre-processing simply by running the block of code listed above engaging QuandlForPortfolio function and we check the results in the Matlab’s command window as follows:

>> whos X Y R AssetList
  Name             Size            Bytes  Class     Attributes
 
  AssetList        1x10             1192  cell                
  R              250x10            20000  double              
  X              251x1              2008  double              
  Y              251x10            20080  double

what confirms the correctness of dimensions as expected.

At this stage, the aforementioned function can be used two-fold. First, we are interested in the portfolio optimisation and we look back at last $n$ calendar days since the most current one (today). The second usage is handy too. We consider our stocks on the shopping list and fetch for their last, say, $n=7\times365$ days with data. If all stocks were traded over past 7 years we should be able to collect a reach data set. If not, the function will adjust the beginning and end date to meet the initial time constrains as required for $R$ matrix construction. For the former case, we can use 7-year data sample for direct backtesting of algo models utilizing Portfolio Optimization.

Stay tuned as we will rock this land in the next post!

Any Questions?

Share them across QuantCove.com – the official Forum of QuantAtRisk.

Download Intraday and Tick-Data from QaR

Since this week you have an opportunity to get an access to random sets of historical intraday trading and tick-data sets across different markets. Simply scroll down Download menu option at the top of the screen. In time, more data sets will be added.

Happy backtesting!

Sincerely Yours,
QaR Team

Create a Portfolio of Stocks based on Google Finance Data fed by Quandl

There is an old saying Be careful what you wish for. Many quants and traders wished for a long time for a better quality of publicly available data. It wasn’t an easy task to accomplish but with a minimum of effort, it has become possible to get all what you truly dreamt of. When I started my experience with daily backtesting of my algo strategies I initially relied on Yahoo! data. Yeah, easily accessible but of low quality. Since the inception of Google Finance the same dream had been reborn. More precise data, covering more aspects of trading, platforms, markets, instruments. But how to get them all?


Quandl.com provides us with an enormous access to an enormous database. Just visit the webpage to find out what I mean by barley putting my thoughts all together in one sentence. In fact, the portal’s accessibility via different programming languages (e.g. Matlab, C++, Python, etc.) makes that database more alive than ever. The space of parameters the quants get an access to becomes astronomical in number!

Alleluja! Let’s ride this horse. Enough of watching the online charts changing a direction of our investing moods with a one-mintute time resolution. We were born to be smarter than stochastic processes taking control over our daily strategies. In this short post, making use of the Matlab R2013a platform, I want to show you how quickly you can construct any portfolio of US stocks taken from the Dow Jones Index universe (as an example). We will use the Google Finance stock data fed by Quandl.com.

Google Finance Stock Data for Model Feeding

Similar to Yahoo! Finance’s resources, Quandl.com has an own system of coding the names of all US stocks. Therefore, we need to know it before doing any further step. The .csv file containing all of the codes for Quandl’s stock data can be obtained via API for Stock Data webpage. The file contains the header of the form:

Ticker      Stock Name      Price Code      Ratios Code      In Market?

linking Google Finance ticker with Quandl’s code (Price Code). Doing some extra job with data filtering, I created the pre-processed set of all tickers excluding those non-active in the markets and removing all invalid data. You can download it here QuandlStockCodeListUS.xlsx as an Excel workbook.

The second list we need to have is a list of potential stock we want to deal with in our portfolio. As for today, the current list of Dow Jones Index’s components you can download here: dowjones.lst. Let’s start with it:

1
2
3
4
5
6
7
8
9
10
11
12
13
% Create a Portfolio from Dow Jones Stocks based on Google Finance
%  Data fed by Quandl.com
%
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
 
 
close all; clear all; clc;
 
% Read the list of Dow Jones components
fileID = fopen('dowjones.lst');
tmp = textscan(fileID, '%s');
fclose(fileID);
dowjc=tmp{1};  % a list as a cell array

The guys from Mathworks suggest to make a new and good practice of using textscan command (lines 10-12) instead of dataread for reading data from the text files. The latter will be removed from the next Matlab release.

Next, we need to import the current list of tickers for US stocks and their corresponding codes in the Quandl’s database. We will make the use of my pre-processed data set of QuandlStockCodeListUS.xlsx as mentioned earlier in the following way:

15
16
17
% Read in the list of tickers and internal code from Quandl.com
[ndata, text, alldata] = xlsread('QuandlStockCodeListUS.xlsx');
quandlc=text(:,1); % again, as a list in a cell array

For the simplicity of our example, we will be considering all 30 stocks as a complete sample set of stocks in our portfolio. A portfolio construction is not solely a selection of stocks. It usually links a time period of the past performance of each individual stock. In the following move, we will collect daily stock returns over last 365 calendar year (corresponding to about 252 trading days).

This framework is pretty handful. Why? If you build or run live a model, most probably your wish is to update your database when US markets are closed. You fetch Quandl.com for data. Now, if your model employs risk management or portfolio optimisation as an integrated building-block, it’s very likely, you are interested in risk(-return) estimation, e.g. by calculating Value-at-Risk (VaR) or similar risk measures. Of course, the choice of the time-frame is up to you based on your strategy.

Having that said,

19
20
21
% fetch stock data for last 365 days
date2=datestr(today,'yyyy-mm-dd')     % from
date1=datestr(today-365,'yyyy-mm-dd') % to

what employs an improved Matlab’s command of datestr with formatting as a parameter. It’s really useful when it comes to data fetching from Quandl.com as the format of the date they like to see in your code is yyyy-mm-dd. The above two commands return in Matlab the following values:

date2 =
  2013-11-03
date1 =
  2012-11-03

as for the time of writing this article (string type).

Get an Access to Quandl’s Resources


Excellent! Let’s catch some fishes in the pond! Before doing it we need to make sure that Quandl’s software is installed and linked to our Matlab console. In order to download Quandl Matlab package just visit this website and follow the installation procedure at the top of the page. Add proper path in Matlab environment too. The best practice is to register yourself on Quandl. That allows you to rise the daily number of data call from 50 to 500. If you think you will need more data requests per day, from the same webpage you can send your request.

When registered, in your Profile section on Quandi, you will gain an access to your individual Authorization Token, for example:
qtokenYou will need to replace ‘yourauthenticationtoken’ in the Matlab code (see line 35) with your token in order to get the data.

Fetching Stock Data

From this point the road stops to wind. For all 30 stocks belonging to the current composition of Dow Jones Index, we get their Open, High, Low, Close prices (as retrieved from Google Finance) in the following way:

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
% create an empty array for storing stock data
stockd={};
% scan the Dow Jones tickers and fetch the data from Quandl.com
for i=1:length(dowjc)
    for j=1:length(quandlc)
        if(strcmp(dowjc{i},quandlc{j}))
            fprintf('%4.0f %s\n',i,quandlc{j});
            % fetch the data of the stock from Quandl
            % using recommanded Quandl's command and
            % saving them directly into Matlab's FTS object (fts)
            fts=0;
            [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ...
                          'authcode','yourauthenticationtoken',...
                          'start_date',date1,'end_date',date2);
            stockd{i}=fts; % entire FTS object in an array's cell
        end
    end
end
% use 'save' command to save all actual variables from memory on the disc,
%  if you need it for multiple code running
% save data

A code in line 35 to 37 is an example how you call Quandi for data and fetch them into FINTS (financial time series object from the Financial Toolbox in Matlab). So easy. If you wish to change the time frequency, apply transformation or get the data not in FINTS but .csv format, all is available for you as a parameter at the level of calling Quandl.get() command (explore all possibilities here).

Having that, you can easily plot the Google Finance data, for instance, of IBM stock straight away:

% diplay High-Low Chart for IBM
highlow(stockd{11})
title('IBM');
ylabel('Stock Price ($)');

to get:
quandl-fig01

or prepare a return-risk plot for your further portfolio optimisation

ret=[];
for i=1:length(stockd)
    % extract the Close Price
    tmp=fts2mat(stockd{i}.Close,0);
    % calculate a vector of daily returns
    r=tmp(2:end)./tmp(1:end-1)-1;
    ret=[ret r];
end
% calculate the annualized mean returns for 30 stocks
mret=mean(ret)*sqrt(250);
% and corresponding standard deviations
msig=std(ret)*sqrt(250);
 
% plot return-risk diagram
p=100*[msig; mret]'
plot(p(:,1),p(:,2),'b.');
xlim([0 30]);
xlabel('Annualized Risk of Stock (%)')
ylabel('Annualized Mean Return (%)');

providing us with
quandl-fig02

Forum: Further Discussions

You can discuss the current post with other quants within a new Quant At Risk Forum (Data Handling and Pre-Processing section).

Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting


Within the evolution of Mathworks’ MATLAB programming environment, finally, in the most recent version labelled 2013a we received a longly awaited line-command facilitation for pulling stock data directly from the Yahoo! servers. What does that mean for quants and algo traders? Honestly, a lot. Now, simply writing a few commands we can have nearly all what we want. However, please keep in mind that Yahoo! data are free therefore not always in one hundred percent their precision remains at the level of the same quality as, e.g. downloaded from Bloomberg resources. Anyway, just for pure backtesting of your models, this step introduces a big leap in dealing with daily stock data. As usual, we have a possibility of getting open, high, low, close, adjusted close prices of stocks supplemented with traded volume and the dates plus values of dividends.

In this post I present a short example how one can retrieve the data of SPY (tracking the performance of S&P500 index) using Yahoo! data in a new Matlab 2013a and I show a simple code how one can test the time period of buying-holding-and-selling SPY (or any other stock paying dividends) to make a profit every time.

The beauty of Yahoo! new feature in Matlab 2013a has been fully described in the official article of Request data from Yahoo! data servers where you can find all details required to build the code into your Matlab programs.

Model for Dividends

It is a well known opinion (based on many years of market observations) that one may expect the drop of stock price within a short timeframe (e.g. a few days) after the day when the stock’s dividends have been announced. And probably every quant, sooner or later, is tempted to verify that hypothesis. It’s your homework. However, today, let’s look at a bit differently defined problem based on the omni-working reversed rule: what goes down, must go up. Let’s consider an exchange traded fund of SPDR S&P 500 ETF Trust labelled in NYSE as SPY.

First, let’s pull out the Yahoo! data of adjusted Close prices of SPY from Jan 1, 2009 up to Aug 27, 2013

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
% Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting
% (c) 2013 QuantAtRisk.com, by Pawel Lachowicz
 
close all; clear all; clc;
 
date_from=datenum('Jan 1 2009');
date_to=datenum('Aug 27 2013');
 
stock='SPY';
 
adjClose = fetch(yahoo,stock,'adj close',date_from,date_to);
div = fetch(yahoo,stock,date_from,date_to,'v')
returns=(adjClose(2:end,2)./adjClose(1:end-1,2)-1);
 
% plot adjusted Close price of  and mark days when dividends
% have been announced
plot(adjClose(:,1),adjClose(:,2),'color',[0.6 0.6 0.6])
hold on;
plot(div(:,1),min(adjClose(:,2))+10,'ob');
ylabel('SPY (US$)');
xlabel('Jan 1 2009 to Aug 27 2013');

and visualize them:

spy-1

Having the data ready for backtesting, let’s look for the most profitable period of time of buying-holding-and-selling SPY assuming that we buy SPY one day after the dividends have been announced (at the market price), and we hold for $dt$ days (here, tested to be between 1 and 40 trading days).

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% find the most profitable period of holding SPY (long position)
neg=[];
for dt=1:40
 
buy=[]; sell=[];
for i=1:size(div,1)
    % find the dates when the dividends have been announced
    [r,c,v]=find(adjClose(:,1)==div(i,1));
    % mark the corresponding SPY price with blue circle marker
    hold on; plot(adjClose(r,1),adjClose(r,2),'ob');
    % assume you buy long SPY next day at the market price (close price)
    buy=[buy; adjClose(r-1,1) adjClose(r-1,2)];
    % assume you sell SPY in 'dt' days after you bought SPY at the market
    % price (close price)
    sell=[sell; adjClose(r-1-dt,1) adjClose(r-1-dt,2)];
end
 
% calculate profit-and-loss of each trade (excluding transaction costs)
PnL=sell(:,2)./buy(:,2)-1;
% summarize the results
neg=[neg; dt sum(PnL<0) sum(PnL<0)/length(PnL)];
 
end

If we now sort the results according to the percentage of negative returns (column 3 of neg matrix), we will be able to get:

>> sortrows(neg,3)
 
ans =
   18.0000    2.0000    0.1111
   17.0000    3.0000    0.1667
   19.0000    3.0000    0.1667
   24.0000    3.0000    0.1667
    9.0000    4.0000    0.2222
   14.0000    4.0000    0.2222
   20.0000    4.0000    0.2222
   21.0000    4.0000    0.2222
   23.0000    4.0000    0.2222
   25.0000    4.0000    0.2222
   28.0000    4.0000    0.2222
   29.0000    4.0000    0.2222
   13.0000    5.0000    0.2778
   15.0000    5.0000    0.2778
   16.0000    5.0000    0.2778
   22.0000    5.0000    0.2778
   27.0000    5.0000    0.2778
   30.0000    5.0000    0.2778
   31.0000    5.0000    0.2778
   33.0000    5.0000    0.2778
   34.0000    5.0000    0.2778
   35.0000    5.0000    0.2778
   36.0000    5.0000    0.2778
    6.0000    6.0000    0.3333
    8.0000    6.0000    0.3333
   10.0000    6.0000    0.3333
   11.0000    6.0000    0.3333
   12.0000    6.0000    0.3333
   26.0000    6.0000    0.3333
   32.0000    6.0000    0.3333
   37.0000    6.0000    0.3333
   38.0000    6.0000    0.3333
   39.0000    6.0000    0.3333
   40.0000    6.0000    0.3333
    5.0000    7.0000    0.3889
    7.0000    7.0000    0.3889
    1.0000    9.0000    0.5000
    2.0000    9.0000    0.5000
    3.0000    9.0000    0.5000
    4.0000    9.0000    0.5000

what simply indicates at the most optimal period of holding the long position in SPY equal 18 days. We can mark all trades (18 day holding period) in the chart:

spy-2

where the trade open and close prices (according to our model described above) have been marked in the plot by black and red circle markers, respectively. Only 2 out of 18 trades (PnL matrix) occurred to be negative with the loss of 2.63% and 4.26%. The complete distribution of profit and losses from all trades can be obtained in the following way:

47
48
49
50
figure(2);
hist(PnL*100,length(PnL))
ylabel('Number of trades')
xlabel('Return (%)')

returning

spy-3

Let’s make some money!

The above Matlab code delivers a simple application of the newest build-in connectivity with Yahoo! server and the ability to download the stock data of our interest. We have tested the optimal holding period for SPY since the beginning of 2009 till now (global uptrend). The same code can be easily used and/or modified for verification of any period and any stock for which the dividends had been released in the past. Fairly simple approach, though not too frequent in trading, provides us with some extra idea how we can beat the market assuming that the future is going to be/remain more or less the same as the past. So, let’s make some money!

Rebinning of Financial Time-Series

Working with financial time-series, especially in trading and its following analysis of data trends or so on, we wish to rebin (resample) our data into a new time-series which would provide us with some sort of a new information on the average value of the underlying data records characterized by high volatility in time.

The following algorithm in Maltab does the job pretty well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
% Function rebins any time-series [x,y] where x is a time vector
%   and y stores time-series values from the current (regular or
%   irregular samping). A new time-series sampling is dt.
%
%  Input: x  : time vector [1xN]
%         y  : data vector [1xN]
%         dy : data error vector [1xN]
%         dt : new rebinning time (dt > current sampling)
%
% (c) 2013, QuantAtRisk.com, by Pawel Lachowicz
%
% Example: x=1:1:1001; % assume 1001 trading days
%          y=rand(1,1001); % simulate pseudo data
%          dy=0.1*rand(1,1001); % simulate pseudo data errors (if required)
%          dt=50; % rebin data from 1 day sampling down to 50 day intervals
%          [rx,ry]=rebin(x,y,dy,dt);
%      or  [rx,ry]=rebin(x,y,[],dt];
%          plot(x,y,'color',[0.7 0.7 0.7]); % plot original time-series
%          hold on; plot(rx,ry,'ro-'); % plot down-rebined time-series
 
function [rx,ry]=rebin(x,y,dy,dt)
    if(isempty(dy))
        dy=ones(1,length(y));
    end
    rdata=[];
    j=1;
    k=1;
    t2=x(1);
    while(j<=length(x))
        i=j;
        if((x(i)+dt)>x(end))
            break
        else
            t2=t2+dt;
        end
        i=j;
        sa=0;
        wa=0;
        ea=0;
        il=0;
        while(x(i)<t2)
            sa=sa+(y(i)/dy(i)^2);
            wa=wa+(1/dy(i)^2);
            i=i+1;
            il=il+1;
        end
        ry=sa/wa;
        rx=t2-dt;
        if(il>=1)
            rdata=[rdata; rx ry];
        end
        j=j+il;
    end
    rx=rdata(:,1);
    ry=rdata(:,2);
end

Example

As an example we will use the daily rate of returns in trading of AAPL (Apple Inc.) from Jan-1984 to Apr-2011 (download Matlab’s aaplr.mat M-file). Below, the short set of command lines allow us to execute the rebinning process of return series,

clear all; close all; clc;
 
load aaplr.mat
plot(aaplR)
 
x=1:length(aaplR); % 6855 days
y=aaplR;
plot(x,y,'color',[0.7 0.7 0.7]);
 
% rebin the data with dt=25 days step (one trading month)
dt=25;
dy=[]; % we have no information on data errors
 
[rx,ry]=rebin(x,y,dy,dt)
 
% overplot results and display first 500 days
hold on; plot(rx,ry,'ro-');
xlim([0 500]);
ylim([-0.125 0.15]);

and plot both time-series, original and rebinned with a new bin time of 25 days for the first 500 days of trading:

rb

where a red line denotes a new rebinned data time-series with a binning time of 25 trading days. The function (the algorithm) computes a simple weighted mean based on data points falling into an interval of $\langle t,t+dt \rangle$. If we do not specify the input data error vector of $dy$ as in the example above, we should get a simple mean as a consequence.

Trend Identification for FX Traders (Part 2)


In my previous post I provided an introduction to the trading model invention and design. We made use of FX data of AUDUSD pair sampled hourly and splitting data into weekly time-series.

Today, we will work a bit harder over formulation of the very first rules for the model. This step will require an engagement of our creativity in understanding what data are like as well as how we can make a basic set of rules which would help us to perform an attractive time-series classification. Our objective is to invent a method which will be helpful in classification of last week FX pair’s time-series to be either in the downtrend or in the uptrend.

The most naive way of classification of directional information contained in any time-series is its slope: a straight line fit to the data. Let’s use it as our starting point. Instead of fitting all data points for a given week, we find median values for the first and the last 12 data points both in Time $(x1,x2)$ and Pair Ratio $(y1,y2)$ as specified in lines 92 to 94:

65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
% FX time-series analysis
% (c) Quant at Risk, 2012
%
% Part 2: Classification of weekly time-series
 
close all; scrsz = get(0,'ScreenSize');
h=figure('Position',[70 scrsz(4)/2 scrsz(3)/1.1 scrsz(4)/2],'Toolbar','none');
fprintf('\nuptrend/downtrend identification.. ');
% for viewing use the loop
hold off;
set(0,'CurrentFigure',h);
 
% pre-define variables
trend=zeros(nw,1);
slope=zeros(nw,1);
midp={};  % middle points
endp={};  % end points (median based on last 12 points)
 
for i=1:nw  %--- a loop over total number of weeks available
 
    % reading time-series for a current week 
    w=week{i}; 
    x=w(:,1); y=w(:,2);
 
    % plot the time-series
    hold on; plot(x,y,'k');
 
    % linear trend estimation
    x1=median(x(1:12)); x2=median(x(end-11:end));
    y1=median(y(1:12)); y2=median(y(end-11:end));
 
    % define end-point of the time-series and mark it on the plot
    endp{i}=[x2 y2];
    hold on; plot(endp{i}(1),endp{i}(2),'b*');
 
    % find slope
    m=(y2-y1)/(x2-x1);
    slope(i)=m;
    xl=x1:dt:x2;       
    yl=m*xl-m*x2+y2;   % a line of representing the slope
    hold on; plot(xl,yl,'b:');
 
    % find middle point of the line and mark it on the plot
    mx=mean(xl);
    my=mean(yl);
    midp{i}=[mx my];
    hold on; plot(midp{i}(1),midp{i}(2),'bo');

As an example of the code execution, for the first two weeks we plot slopes, mid-points and end-points:

We assume that our classification procedure will be based solely on the information provided for end-points and slopes. The exception we make for the classification of the first two weeks. For the first week the distinction between uptrend and downtrend is made based on the position of the first and last point:

113
114
115
116
117
118
119
120
121
122
123
124
    % Time-Series Classification
 
    if(i==1)
        ybeg=y(1); yend=y(end);
        if(ybeg<yend)
            trend(i)=+1; % uptrend
            hold on; plot(x,y,'k');
        else
            trend(i)=-1; % downtrend
            hold on; plot(x,y,'r');
        end
    end

where we mark the result of classification with a sign $+1$ or $-1$ in a vector element of $trend$, and plot it with black and red color denoting uptrend and downtrend, respectively.

For the second week, our rules of classification are enriched by additional information about the end-point of a current and a previous week:

126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
    if(i==2)
        % week(current-1)
        tmp=week{i-1};
        x1=tmp(:,1); y1=tmp(:,2);
        y1b=y1(1); y1e=y1(end);
        % week(current)
        y0b=y(1); y0e=y(end);
        if(y0e>y1e)
            trend(i)=+1; % uptrend
            hold on; plot(x,y,'k');
        else
            trend(i)=-1; % downtrend
            hold on; plot(x,y,'r');
        end
    end

For weeks number 3 and higher we do our creative research over the data to define a specific set of rules. We allow to take into account the information from two weeks prior to the current one and combine them all together. The following code represents an attractive solution, subject to improvement:

142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
    if(i>2)
        % week(current-2)
        mid2=midp{i-2}(2);
        end2=endp{i-2}(2);
        slp2=slope(i-2);
        % week(current-1)
        mid1=midp{i-1}(2);
        end1=endp{i-1}(2);
        slp1=slope(i-1);
        % week(current)
        mid0=midp{i}(2);
        end0=endp{i}(2);
        slp0=slope(i);
        if((mid0>mid1))                     % up-trend
            if((mid0>mid2)&&(end0>end1))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % strong up-trend
            elseif((mid0>mid2)&&(end0>end2))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % weak up-trend
            elseif((mid0<mid2)&&(end0<end2)&&(slp0<0))
                trend(i)=-1;
                hold on; plot(x,y,'r');    % turns into possible down-trend
            elseif((mid0<mid2)&&(end0<end2)&&(slp0>0))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % turns into possible up-trend
            else
                trend(i)=+1;                
                hold on; plot(x,y,'k');    % turns into possible up-trend
            end
        elseif(mid0<mid1)                  % down-trend
            if((mid0<mid2)&&(end0<end1)&&(end0<end2))
                trend(i)=-1;
                hold on; plot(x,y,'r');    % weak down-trend
            elseif((mid0<mid2)&&(end0<end2)&&(end0>end1))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % possible up-trend
            elseif((mid0<mid2)&&(end0>end2))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % turns into possible up-trend
            elseif((mid0>mid2)&&(end0<end1)&&(end0<end2))
                trend(i)=-1;
                hold on; plot(x,y,'r');
            elseif((mid0>mid2)&&(end0>end2))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % turns into possible up-trend
            elseif((mid0>mid2)&&(end0>end1))
                trend(i)=+1;
                hold on; plot(x,y,'k');    % turns into possible up-trend
            else
                trend(i)=-1;                 
                hold on; plot(x,y,'r');
            end
        end
    end
end

Since one picture is worth millions of lines of code, below we present three examples of our model in action. The last plot corresponds to the latest Global Financial Crisis and shows how weeks in uptrends of 2009 followed these in downtrend a year before.

It is straightforward to note that the performance of our rules works very intuitively and stays in general agreement with the market sentiments.

Accessing a SQL Server database in Matlab and the use of FTS objects


Feeding our models with data for backtesting purposes requires pulling data from the server. If we do our modeling part in Matlab under Linux and the MS SQL Server database is our target, the quickest solution for the establishment of a good connectivity is making use of Java Database Connectivity (JDBC) driver. Firstly, visit the following web page to download and current driver (.tar.gz) and next follow the installation steps provided by Mathworks.

In the following example we will try to connect to our database called ‘XDataBase’ and retrieve stock prices from Feb 2, 1998 of Apple Inc. (NASDAQ ticker: AAPL).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ticker=['AAPL'];
start=['02-Feb-1998'];
 
% --Connect to your local SQL Server and fetch the data for a ticker
%
% define connection
conn=database('XDataBase','matlab','m4tl4b','com.microsoft.sqlserver.jdbc.SQLServerDriver','jdbc:sqlserver://163.118.2.7:1533;database=XDataBase');
 
% specify return format
setdbprefs('DataReturnFormat','structure');
 
% define your SQL query, as for example:
QueryString = ['SELECT sec.tic, dp.datadate, dp.prchd As PriceH, dp.prcld As PriceL, dp.prccd As PriceC, dp.prccd/dp.ajexdi As PriceCA, dp.cshtrd*dp.ajexdi As Vol FROM  Xpressfeed.dbo.security AS sec INNER JOIN Xpressfeed.dbo.sec_dprc AS dp ON sec.gvkey = dp.gvkey AND sec.iid = dp.iid AND dp.datadate >=''',start,''' WHERE (sec.tic IN (''',ticker,'''))'];
 
% execute connection, fetch data, and assign them to a Matlab variable
cur = exec(conn, QueryString);
myData = fetch(cur);
secData = myData.data;

In the above example, our structure variable of secData is composed of secData.PriceH, secData.PriceL, secData.PriceC, secData.PriceCA, and secData.Vol arrays containing information about AAPL’s High, Low, Close, Adjusted Close prices and Volume traded, respectively.

Making use of Matlab’s Financial Toolbox, one can store all these data into a useful text-like file format. We do it by construction of Financial Time-Series (FTS) object:

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
% verify the existence of data for a ticker
skip=false;
try
    n=length(secData.PriceC);  % a random test
catch
    skip=true;
end
if(~skip)
    dates=secData.datadate;
    dates=datenum(dates);
    data=[zeros(n,1) secData.PriceH secData.PriceL secData.PriceC secData.PriceCA secData.Vol];
    colheads={'DATES';'OPEN';'HIGH';'LOW';'CLOSE';'CLOSEA';'VOLUME'};
    desc=ticker;
    filename=[ticker,'.dat'];
    % save AAPL's extracted Price and Volume data into AAPL.dat file
    fts2ascii(filename,dates,data,colheads,desc);
 
    % construct FTS object by reading the data from the file
    fts=ascii2fts(filename,1,2);
    data=fts2mat(fts,1);
    % and plot the Adjusted Close Price
    scrsz = get(0,'ScreenSize'); 
    hh=figure('Position',[70 scrsz(4)/2 scrsz(3)/1.1 scrsz(4)/2],'Toolbar','none');
    plot(data(:,1),data(:,5)); 
    xlim([min(data(:,1)) max(data(:,1))]); 
    title(ticker);
    xlabel('Time [days]'); ylabel('Adjusted Close Price [US$]');
end

As the result of execution of code lines 28 to 35, we were able to retrieve AAPL stock data and save them into AAPL.dat file. First ten lines of the file are as follows:

$ head -10 AAPL.txt
AAPL
DATES 	OPEN 	HIGH 	LOW 	CLOSE 	CLOSEA 	VOLUME
02-Feb-1998   0.000000   18.500000   17.375000   17.688000   4.422000   22740000
03-Feb-1998   0.000000   18.625000   17.688000   18.313000   4.578250   14380000
04-Feb-1998   0.000000   18.500000   18.000000   18.250000   4.562500   6078400
05-Feb-1998   0.000000   18.500000   18.000000   18.313000   4.578250   8508000
06-Feb-1998   0.000000   18.688000   18.250000   18.500000   4.625000   7228000
09-Feb-1998   0.000000   19.500000   18.375000   19.188000   4.797000   17668000
10-Feb-1998   0.000000   19.563000   19.063000   19.438000   4.859500   15072000
11-Feb-1998   0.000000   19.500000   18.875000   19.000000   4.750000   7560000

In line 38 we show how to construct FTS object

>> whos fts
  Name         Size             Bytes  Class    Attributes
 
  fts       3366x6             190016  fints

whereas line 39 provides us with a transformation of the object’s data into matrix format that we may use to plot AAPL’s Adjusted Close Prices.

It is of great importance to remind that Financial Toolbox allows us to express a date (e.g. 11-Feb-1998) as a continuous number:

>> datenum('02-Feb-1998')
ans =
      729788

or reversely

>> datestr(ans)
ans =
02-Feb-1998

Extracting Time-Series from Tick-Data


The tick-data provided in the .csv (comma separated values) file format sometimes may be a real problem to handle quickly, especially when the total size starts to count in hundreds of GB. If your goal is to extract a time-series with, say, hourly time resolution only, this article will provide you with some fresh and constructive guidelines how to do it smartly in the Linux environment.

First, let’s have a closer look at the data. Say, we have a collection of 2148 .csv files hosting the FX trading history of AUDUSD pair, covering nearly 10 years between 2000 and 2010. Each file is 7.1 MB large what leaves us with approximately 15 GB of data to process. Having a look into the randomly selected file we can identify the header and data themselves:

$ head -10 audusd_216.csv 
Ticks,TimeStamp,Bid Price,Bid Size,Ask Price,Ask Size
632258349000000015,2004-07-19 11:55:00.000,0.7329,1000000,0.7334,1000000
632258349000000016,2004-07-19 11:55:00.000,0.7329,1000000,0.7334,1000000
632258349000000017,2004-07-19 11:55:00.000,0.7329,1000000,0.7333,1000000
632258349000000018,2004-07-19 11:55:00.000,0.7327,1000000,0.7333,1000000
632258349000000019,2004-07-19 11:55:00.000,0.7327,1000000,0.7333,1000000
632258349000000020,2004-07-19 11:55:00.000,0.7328,1000000,0.7333,1000000
632258349000000021,2004-07-19 11:55:00.000,0.7328,1000000,0.7334,1000000
632258349600000000,2004-07-19 11:56:00.000,0.7328,1000000,0.7334,1000000
632258349600000001,2004-07-19 11:56:00.000,0.7328,1000000,0.7336,1000000

Our aim will be to extract Bid and Ask Price time-series. We will make use of a few Linux standard tools, e.g. sed, awk, supplemented with extra f77 codes. It is also to demonstrate how shell programming can be useful while we have an opportunity to explore the enigmatic syntax of its programs. Generally, we will be writing a shell script, executable for any FX pair name, e.g. gbpnzd, eurjpy, and so on.

In the first step of the script we create a list of all files. This is tricky in Linux as the standard command of ‘ls -lotr’ though returns a desired list but also all details on the file size, attributes, etc. which we do not simply want. Lines 9 and 10 solve the problem,

1
2
3
4
5
6
7
8
9
10
# Extracting Time-Series from Tick-Data .csv files
# (c) Quant at Risk, 2012
#
# Exemplary usage: ./script.src audusd
 
#!/bin/bash
 
echo "..making a sorted list of .csv files"
for i in $1_*.csv; do echo ${i##$1_} $i ${i##.csv}; 
done | sort -n | awk '{print $2}' > $1.lst

and a newly create file of \$1.lst (note: \$1 corresponds in the shell script to the parameter’s name we called the script with, e.g. audusd; therefore \$1.lst physically means audusd.lst) contains the list:

audusd_1.csv
audusd_2.csv
audusd_3.csv
...
audusd_2148.csv

We create one data file from all 2148 pieces by creating and executing an in-line script:

12
13
14
15
16
17
echo "..creating one data file"
awk '{print "cat",$1," &gt;&gt; tmp.lst"}' $1.lst &gt; tmp.cmd
chmod +x tmp.cmd
./tmp.cmd
rm tmp.cmd
mv tmp.lst $1.tmp

Now, \$1.tmp is a 15 GB file and we may wish to remove some unnecessary comments and tokens:

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
echo "..removing comments"
sed 's/Ticks,TimeStamp,Bid Price,Bid Size,Ask Price,Ask Size//g' $1.tmp > $1.tmp2
rm $1.tmp
 
echo "..removing empty lines"
sed '/^$/d' $1.tmp2 > $1.tmp
rm $1.tmp2
 
echo "..removing token ,"
sed 's/,/ /g' $1.tmp > $1.tmp2
rm $1.tmp
 
echo "..removing token :"
sed 's/:/ /g' $1.tmp2 > $1.tmp
rm $1.tmp2
 
echo "..removing token -"
sed 's/-/ /g' $1.tmp > $1.tmp2
rm $1.tmp
 
echo "..removing column with ticks and ask/bid size"
awk '{print $2,$3,$4,$5,$6,$7,$8,$10}' $1.tmp2 > $1.tmp
rm $1.tmp2

In order to convert a time information into a continuous measure of time, we modify the f77 code for our task as follows:

c Extracting Time-Series from Tick-Data .csv files
c (c) Quant at Risk, 2012
c
c Program name: fx_getmjd.for
c Aim: removes ticks and coverts trade time into MJD time [day]
c Input data format: YYYY MM DD HH MM SS.SSS BID BID_Vol ASK ASK_Vol
 
      implicit double precision (a-z)
      integer y,m,d,hh,mm,jd
      integer*8 bidv,askv
      character zb*50
 
      call getarg(1,zb)
      open(1,file=zb)
      do i=1.d0, 500.d6
        read(1,*,end=1) y,m,d,hh,mm,ss,bid,ask
        jd= d-32075+1461*(y+4800+(m-14)/12)/4+367*(m-2-(m-14)/12*12)
     _    /12-3*((y+4900+(m-14)/12)/100)/4
        mjd=(jd+(hh-12.d0)/24.d0+mm/1440.d0+ss/86400.d0)-2400000.5d0
        mjd=mjd-51544.d0 ! T_0 = 2000.01.01 00:00
        abr=ask/bid
        write(*,2) mjd,bid,ask,abr
      enddo                             
    1 close(1)
    2 format(F15.8,F8.4,F8.4,F12.6)
 
end

and execute it in our script:

43
44
45
echo "..changing a date to MJD"
fx_getmjd $1.tmp > $1.dat
rm $1.tmp

In the aforementioned f77 code, we set a zero time point (MJD=0.00) on Jan 1, 2000 00:00. Since that day, now our time is expressed as a single column measuring time progress in days with fractional parts tracking hours and minutes.

We may split the data into two separate time-series containing Bid and Ask Prices at the tick-data level:

47
48
49
echo "..splitting into bid/ask/abr files"
awk '{print $1,$2}' $1.dat > $1.bid
awk '{print $1,$3}' $1.dat > $1.ask

A quick inspection of both files reveals we deal with nearly $500\times 10^6$ lines! Before we reach our chief aim, i.e. rebinning the series with 1-hour time resolution, there is a need to, unfortunately, separate input into 5 parts, each of maximum $100\times 10^6$ lines. The latter may vary depending of RAM memory size available, and if sufficient, this step can be even skipped. We proceed:

51
52
53
54
55
56
echo "..spliting bid/ask/abr into separate files"
fx_splitdat $1 1
fx_splitdat $1 2
fx_splitdat $1 3
fx_splitdat $1 4
fx_splitdat $1 5

where fx_splitdat.for code is given as follows:

c Extracting Time-Series from Tick-Data .csv files
c (c) Quant at Risk, 2012
c
c Program name: fx_splitdat.for
c Exemplary usage: ./fx_splitdat audusd [1,2,3,4,5]
 
      implicit double precision (a-z)
      integer      nc
      character*6  zb
      character*10 zbask,zbbid
      character*16 zb1ask,zb2ask,zb3ask,zb4ask,zb5ask
      character*16 zb1bid,zb2bid,zb3bid,zb4bid,zb5bid
      character*1  par2,st
 
c zb- name of length equal 6 characters only
      call getarg(1,zb)
      call getarg(2,par2)  ! case
 
      write(st,'(a1)') par2
      read(st,'(i1)') nc
 
      zbask=zb(1:6)//'.ask'
      zbbid=zb(1:6)//'.bid'
 
      zb1ask=zb(1:6)//'.ask.part1'
      zb2ask=zb(1:6)//'.ask.part2'
      zb3ask=zb(1:6)//'.ask.part3'
      zb4ask=zb(1:6)//'.ask.part4'
      zb5ask=zb(1:6)//'.ask.part5'
 
      zb1bid=zb(1:6)//'.bid.part1'
      zb2bid=zb(1:6)//'.bid.part2'
      zb3bid=zb(1:6)//'.bid.part3'
      zb4bid=zb(1:6)//'.bid.part4'
      zb5bid=zb(1:6)//'.bid.part5'
 
      open(11,file=zbask)
      open(12,file=zbbid)
 
      if(nc.eq.1) then
        open(21,file=zb1ask)
        open(22,file=zb1bid)
        do i=1.d0, 100.d6
          read(11,*,end=1) mjd_ask,dat_ask
          read(12,*,end=1) mjd_bid,dat_bid
          if((i>=1.0).and.(i<100000001.d0)) then
            write(21,2) mjd_ask,dat_ask
            write(22,2) mjd_bid,dat_bid
          endif
        enddo              
      endif
 
      if(nc.eq.2) then
        open(31,file=zb2ask)
        open(32,file=zb2bid)
        do i=1.d0, 200.d6
          read(11,*,end=1) mjd_ask,dat_ask
          read(12,*,end=1) mjd_bid,dat_bid
          if((i>=100000001.d0).and.(i<200000001.d0)) then
            write(31,2) mjd_ask,dat_ask
            write(32,2) mjd_bid,dat_bid
          endif
        enddo              
      endif
 
      if(nc.eq.3) then
        open(41,file=zb3ask)
        open(42,file=zb3bid)
        do i=1.d0, 300.d6
          read(11,*,end=1) mjd_ask,dat_ask
          read(12,*,end=1) mjd_bid,dat_bid
          if((i>=200000001.d0).and.(i<300000001.d0)) then
            write(41,2) mjd_ask,dat_ask
            write(42,2) mjd_bid,dat_bid
          endif
        enddo              
      endif
 
      if(nc.eq.4) then
        open(51,file=zb4ask)
        open(52,file=zb4bid)
        do i=1.d0, 400.d6
          read(11,*,end=1) mjd_ask,dat_ask
          read(12,*,end=1) mjd_bid,dat_bid
          if((i>=300000001.d0).and.(i<400000001.d0)) then
            write(51,2) mjd_ask,dat_ask
            write(52,2) mjd_bid,dat_bid
          endif
        enddo              
      endif
 
      if(nc.eq.5) then
        open(61,file=zb5ask)
        open(62,file=zb5bid)
        do i=1.d0, 500.d6
          read(11,*,end=1) mjd_ask,dat_ask
          read(12,*,end=1) mjd_bid,dat_bid
          if((i>=400000001.d0).and.(i<500000001.d0)) then
            write(61,2) mjd_ask,dat_ask
            write(62,2) mjd_bid,dat_bid
          endif
        enddo              
      endif
 
    1 close(1)
    2 format(F15.8,F8.4)
 
      stop
      end

and compiling it as usual:

f77 fx_splitdat.for -o fx_splitdat

Finally, we can extract the rebinned Bid and Ask Price time-series with bin time of 1 hour, i.e. $dt=0.041666667$ d, making use of the following f77 code:

c Extracting Time-Series from Tick-Data .csv files
c (c) Quant at Risk, 2012
c
c Program name: fx_rebin.for
c Exemplary usage: ./fx_rebin audusd 2
 
      implicit double precision (a-z)
      parameter (dim=100.d6)
 
      double precision f(dim), mjd(dim), step
      character*50 par1, par2, st
 
      call getarg(1,par1)  ! file name
      call getarg(2,par2)  ! bining [d]
 
      write(st,'(a20)') par2
      read(st,'(f20)') step
 
c reading data
      open(1,file=par1)
      do i=1,100.d6
        read(1,*,end=1) 
     _       mjd(i),f(i)
      enddo
    1 close(1)
      n=i-1.d0
 
c main loop
      j=1.d0
      k=1.d0
      t2=0.
      t2=dint(mjd(j))
      do while (j.lt.n)
        i=j
        if ((mjd(i)+step).gt.(mjd(n))) then
          print*
          stop
        else
          t2=t2+step
        endif
        i=j
        il=0.d0
        s=0.d0
	    do while (mjd(i).lt.t2)
	      s=s+f(i)
	      i=i+1.d0
          il=il+1.d0    ! how many points in segment
	    enddo
	    av=s/il
        day=t2-step
        if (il.ge.1.d0) then 
          write(*,3) day,av
        endif
        j=j+il
      enddo
 
    2 format(f30.7,2f30.6)
    3 format(f20.8,f8.4)
 
   10 stop
      end

executed in our script for all five part of both tick-data time-series:

58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
echo "..rebinning with dt = 1 h"
dt=0.041666667
fx_rebin $1.ask.part1 $dt > $1.ask.part1.1h
fx_rebin $1.ask.part2 $dt > $1.ask.part2.1h
fx_rebin $1.ask.part3 $dt > $1.ask.part3.1h
fx_rebin $1.ask.part4 $dt > $1.ask.part4.1h
fx_rebin $1.ask.part5 $dt > $1.ask.part5.1h
fx_rebin $1.bid.part1 $dt > $1.bid.part1.1h
fx_rebin $1.bid.part2 $dt > $1.bid.part2.1h
fx_rebin $1.bid.part3 $dt > $1.bid.part3.1h
fx_rebin $1.bid.part4 $dt > $1.bid.part4.1h
fx_rebin $1.bid.part5 $dt > $1.bid.part5.1h
 
echo "..appending rebinned files"
cat $1.ask.part1.1h $1.ask.part2.1h $1.ask.part3.1h $1.ask.part4.1h $1.ask.part5.1h >$1.ask.1h.tmp
cat $1.bid.part1.1h $1.bid.part2.1h $1.bid.part3.1h $1.bid.part4.1h $1.bid.part5.1h > $1.bid.1h.tmp
rm *part*
 
echo "..removing empty lines in rebinned files"
sed '/^$/d' $1.ask.1h.tmp > $1.ask.1h
rm $1.ask.1h.tmp
sed '/^$/d' $1.bid.1h.tmp > $1.bid.1h
rm $1.bid.1h.tmp
 
echo "..done!"

As the final product we obtain two files, say,

audusd.bid.1h
audusd.ask.1h

of the content:

$ tail -5 audusd.bid.1h 
       3772.62500304   0.9304
       3772.66666970   0.9283
       3772.70833637   0.9266
       3772.75000304   0.9263
       3772.79166970   0.9253

where the first column is time (MJD) with 1-hour resoulution and the second column contains a simple Price average from all tick prices falling into 1-hour time bin. As for dessert, we plot both Bid and Ask Price time-series what accomplishes our efforts:

AUD/USD rebinned time-series with 1-hour resolution

Bank format of numbers in Matlab

The following Matlab function allows for displaying any double-type number using a nice bank format that includes commas separators and two digit precision (without rounding). The number $num$ is converted into a new variable of a string type.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
function [str]=bankformat(num)
    num2=floor(num*1000)/1000;
    r=int32(100*(num-floor(num)));
    str = num2str(num2);
    k=find(str == '.');
    if(isempty(k))
        str=[str,'.00'];
    end
    FIN = min(length(str),find(str == '.')-1);
    for i = FIN-2:-3:2
        str(i+1:end+1) = str(i:end);
        str(i) = ',';
    end
    x=mod(r,10);
    if(x==0)
        str=[str,'0'];
    end
    k=find(str == '.');
    d=length(str)-k;
    if(d>2)
        str=str(1:end-(d-2));
    end
end

Example:

>> n=70149506.5;
>> s=bankformat(n);
>> fprintf('\nInvested capital: £\%1s\n',s)
 
Invested capital: £70,149,506.50
Contact Form Powered By : XYZScripts.com