Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Pre-Processing of Asset Price Series for Portfolio Optimization

Portfolio Optimization is a significant component of Matlab’s Financial Toolbox. It provides us with ready-to-use solution in finding optimal weights of assets that we consider for trading deriving them based on the historical asset performance. From a practical point of view, we can include it in our algorithmic trading strategy and backtest its applicability under different initial conditions. This is a subject of my next up-coming post. However, before we can enjoy the view from the peak, we need to climb the mountain first.

In Matlab, the portfolio is created as a dedicated object of the same name. It doesn’t read the raw stock data. We need to feed that beast. Two major ingredients satisfy the input: a vector of the expected asset returns and a covariance matrix. Matlab helps us to estimate these moments but first we need to deliver asset data in a digestable form.

In this post we will see how one can quickly download the stock data from the Internet based on our own stock selection and pre-process them for solving portfolio optimization problem in Matlab.

Initial Setup for Portfolio Object

Let’s say that at any point of time you have your own list of stocks you wish to buy. For simplicity let’s also assume that the list contains stocks traded on NYSE or NASDAQ. Since you have been a great fun of this game, now you are almost ready to buy what you jotted down on your ShoppingList.lst. Here, an example of 10 tech stocks:

AAPL AOL BIDU GOOG HPQ IBM INTC MSFT NVDA TXN

They will constitute your portfolio of stocks. The problem of portfolio optimization requires a look back in time in the space of returns obtained in trading by each stock. Based on them the Return Proxy and Risk Proxy can be found.

The return matrix $R$ of dimensions $(N-1)\times M$ where $N$ stands for number of historical prices (e.g. derived daily, or monthly, etc.) and $M$ for the number of stocks in our portfolio, is required by Matlab as an input. We will see how does it work in next post. For now let’s solely focus on creation of this matrix.

In the article Create a Portfolio of Stocks based on Google Finance Data fed by Quandl I discussed Quandl.com as an attractive data provider for US stocks. Here, we will follow this solution making use of Quandl resources to pull out the stock price series for our shopping list. Ultimately, we aim at building a function, here: QuandlForPortfolio, that does the job for us:

% Pre-Processing of Asset Price Series for Portfolio Optimization in Matlab % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz   clear all; close all; clc;   % Input Parameters n=1*365; tickers='ShoppingList.lst'; qcodes='QuandlStockCodeListUS.xlsx';   [X,Y,R,AssetList] = QuandlForPortfolio(n,tickers,qcodes);

We call this function with three input parameters. The first one, $n$, denotes a number of calendar days from today (counting backwards) for which we wish to retrieve the stock data. Usually, 365 days will correspond to about 250$-$252 trading days. The second parameter is a path/file name to our list of stock (desired to be taken into account in the portfolio optimisation process) while the last input defines the path/file name to the file storing stocks’ tickers and associated Quandl Price Codes (see here for more details).

Feeding the Beast

The QuandlForPortfolio Matlab function is an extended version of the previously discussed solution. It contains an important correcting procedure for the data fetched from the Quandl servers. First, let’s have a closer look on the function itself:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 % Function assists in fetching Google Finance data from the Quandl.com % server for a given list of tickers of stocks traded on NYSE or % NASDAQ. Data are retrieved for last 'n' days with daily sampling. % % INPUT % n : number of calendar days from 'today' (e.g. 365 would % correspond to about 252 business days) % tickers : a path/file name of a text file listing tickers % qcodes : a path/file name of Excel workbook (.xlsx) containing a list % of tickers and Quandl Price Codes in the format of % [Ticker,Stock Name,Price Code,Ratios Code,In Market?] % OUTPUT % X0 : [Nx1] column vector with days % Y0 : [NxM] matrix with Close Prices for M stocks % R0 : [(N-1)xM] matrix of Retruns % AssetList : a list of tickers (cell array) % % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz   function [X0,Y0,R0,AssetList0] = QuandlForPortfolio(n,tickers,qcodes) fileID = fopen(tickers); tmp = textscan(fileID, '%s'); fclose(fileID); AssetList=tmp{1}; % a list as a cell array   % Read in the list of tickers and internal Quandl codes % [~,text,~] = xlsread(qcodes); quandlc=text(:,1); % again, as a list in a cell array quandlcode=text(:,3); % corresponding Quandl's Price Code   date1=datestr(today-n,'yyyy-mm-dd'); % from date2=datestr(today,'yyyy-mm-dd'); % to   % Fetch the data from Quandl.com % QData={}; for i=1:length(AssetList) for j=1:length(quandlc) if(strcmp(AssetList{i},quandlc{j})) fprintf('%4.0f %s\n',i,quandlc{j}); fts=0; [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ... 'authcode','x',... 'start_date',date1,'end_date',date2,'collapse','daily'); QData{i}=fts; end end end   % Post-Processing of Fetched Data % % create a list of days across all tickers TMP=[]; for i=1:length(QData) tmp=fts2mat(QData{i},1); tmp=tmp(:,1); TMP=[TMP; tmp]; end ut=unique(TMP); % use that list to find these days that are not present % among all data sets TMP=[]; for i=1:length(QData) tmp=fts2mat(QData{i},1); tmp=tmp(:,1); TMP=[TMP; setdiff(ut,tmp)]; end ut=unique(TMP); % finally, extract Close Prices from FTS object and store them % in Y0 matrix, plus corresponding days in X0 X0=[]; Y0=[]; for i=1:length(QData) tmp=fts2mat(QData{i},1); cp=[]; for j=1:size(tmp,1) [r,~,~]=find(ut==tmp(j,1)); if(isempty(r)) cp=[cp; tmp(j,5)]; % column 5 corresponds to Close Price if(i<2) % create a time column vector listing days % common among all data sets X0=[X0; tmp(j,1)]; end end end Y0=[Y0 cp]; end % transform Close Prices into Returns, R(i)=cp(i)/cp(i-1)-1 R0=tick2ret(Y0); AssetList0=AssetList'; end

The main bottleneck comes from the fact that Matlab’s portfolio object demands an equal number of historical returns ($N-1$) in the matrix of $R$ for all $M$ assets. We design the function in the way that it sets the common timeframe for all stocks listed on our shopping list. Of course, we ensure that all stocks were traded in the markets for about $n$ last days (rough estimation).

Now, the timeframe of $n$ last days should be understood as a first approximation. We fetch the data from Quandl (numeric date, Open, High, Low, Close, Volume) and save them in the cell array QData (lines #37-49) for each stock separately as FTS objects (Financial Time-Series objects; see Financial Toolbox). However, it may occur that not every stock we fetched displays the same amount of data. That is why we need to investigate for what days and for what stocks we miss the data. We achieve that by scanning each FTS object and creating a unique list of all days for which we have data (lines #54-60).

Next, we loop again over the same data sets but now we compare that list with a list of all dates for each stock individually (lines #63-69), capturing (line #67) those dates that are missing. Their complete list is stored as a vector in line #69. Eventually, given that, we are able to compile the full data set (e.g. Close Prices; here line #80) for all stocks in our portfolio ensuring that we will include only those dates for which we have prices across all $M$ assets (lines #70-91).

Beast Unleashed

We test our data pre-processing simply by running the block of code listed above engaging QuandlForPortfolio function and we check the results in the Matlab’s command window as follows:

>> whos X Y R AssetList Name Size Bytes Class Attributes   AssetList 1x10 1192 cell R 250x10 20000 double X 251x1 2008 double Y 251x10 20080 double

what confirms the correctness of dimensions as expected.

At this stage, the aforementioned function can be used two-fold. First, we are interested in the portfolio optimisation and we look back at last $n$ calendar days since the most current one (today). The second usage is handy too. We consider our stocks on the shopping list and fetch for their last, say, $n=7\times365$ days with data. If all stocks were traded over past 7 years we should be able to collect a reach data set. If not, the function will adjust the beginning and end date to meet the initial time constrains as required for $R$ matrix construction. For the former case, we can use 7-year data sample for direct backtesting of algo models utilizing Portfolio Optimization.

Stay tuned as we will rock this land in the next post!

Any Questions?

Share them across QuantCove.com – the official Forum of QuantAtRisk.

Happy backtesting!

Sincerely Yours,
QaR Team

## Create a Portfolio of Stocks based on Google Finance Data fed by Quandl

There is an old saying Be careful what you wish for. Many quants and traders wished for a long time for a better quality of publicly available data. It wasn’t an easy task to accomplish but with a minimum of effort, it has become possible to get all what you truly dreamt of. When I started my experience with daily backtesting of my algo strategies I initially relied on Yahoo! data. Yeah, easily accessible but of low quality. Since the inception of Google Finance the same dream had been reborn. More precise data, covering more aspects of trading, platforms, markets, instruments. But how to get them all?

Quandl.com provides us with an enormous access to an enormous database. Just visit the webpage to find out what I mean by barley putting my thoughts all together in one sentence. In fact, the portal’s accessibility via different programming languages (e.g. Matlab, C++, Python, etc.) makes that database more alive than ever. The space of parameters the quants get an access to becomes astronomical in number!

Alleluja! Let’s ride this horse. Enough of watching the online charts changing a direction of our investing moods with a one-mintute time resolution. We were born to be smarter than stochastic processes taking control over our daily strategies. In this short post, making use of the Matlab R2013a platform, I want to show you how quickly you can construct any portfolio of US stocks taken from the Dow Jones Index universe (as an example). We will use the Google Finance stock data fed by Quandl.com.

Google Finance Stock Data for Model Feeding

Similar to Yahoo! Finance’s resources, Quandl.com has an own system of coding the names of all US stocks. Therefore, we need to know it before doing any further step. The .csv file containing all of the codes for Quandl’s stock data can be obtained via API for Stock Data webpage. The file contains the header of the form:

Ticker Stock Name Price Code Ratios Code In Market?

linking Google Finance ticker with Quandl’s code (Price Code). Doing some extra job with data filtering, I created the pre-processed set of all tickers excluding those non-active in the markets and removing all invalid data. You can download it here QuandlStockCodeListUS.xlsx as an Excel workbook.

The second list we need to have is a list of potential stock we want to deal with in our portfolio. As for today, the current list of Dow Jones Index’s components you can download here: dowjones.lst. Let’s start with it:

1 2 3 4 5 6 7 8 9 10 11 12 13 % Create a Portfolio from Dow Jones Stocks based on Google Finance % Data fed by Quandl.com % % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz     close all; clear all; clc;   % Read the list of Dow Jones components fileID = fopen('dowjones.lst'); tmp = textscan(fileID, '%s'); fclose(fileID); dowjc=tmp{1}; % a list as a cell array

The guys from Mathworks suggest to make a new and good practice of using textscan command (lines 10-12) instead of dataread for reading data from the text files. The latter will be removed from the next Matlab release.

Next, we need to import the current list of tickers for US stocks and their corresponding codes in the Quandl’s database. We will make the use of my pre-processed data set of QuandlStockCodeListUS.xlsx as mentioned earlier in the following way:

15 16 17 % Read in the list of tickers and internal code from Quandl.com [ndata, text, alldata] = xlsread('QuandlStockCodeListUS.xlsx'); quandlc=text(:,1); % again, as a list in a cell array

For the simplicity of our example, we will be considering all 30 stocks as a complete sample set of stocks in our portfolio. A portfolio construction is not solely a selection of stocks. It usually links a time period of the past performance of each individual stock. In the following move, we will collect daily stock returns over last 365 calendar year (corresponding to about 252 trading days).

This framework is pretty handful. Why? If you build or run live a model, most probably your wish is to update your database when US markets are closed. You fetch Quandl.com for data. Now, if your model employs risk management or portfolio optimisation as an integrated building-block, it’s very likely, you are interested in risk(-return) estimation, e.g. by calculating Value-at-Risk (VaR) or similar risk measures. Of course, the choice of the time-frame is up to you based on your strategy.

Having that said,

19 20 21 % fetch stock data for last 365 days date2=datestr(today,'yyyy-mm-dd') % from date1=datestr(today-365,'yyyy-mm-dd') % to

what employs an improved Matlab’s command of datestr with formatting as a parameter. It’s really useful when it comes to data fetching from Quandl.com as the format of the date they like to see in your code is yyyy-mm-dd. The above two commands return in Matlab the following values:

date2 = 2013-11-03 date1 = 2012-11-03

Excellent! Let’s catch some fishes in the pond! Before doing it we need to make sure that Quandl’s software is installed and linked to our Matlab console. In order to download Quandl Matlab package just visit this website and follow the installation procedure at the top of the page. Add proper path in Matlab environment too. The best practice is to register yourself on Quandl. That allows you to rise the daily number of data call from 50 to 500. If you think you will need more data requests per day, from the same webpage you can send your request.

You will need to replace ‘yourauthenticationtoken’ in the Matlab code (see line 35) with your token in order to get the data.

Fetching Stock Data

From this point the road stops to wind. For all 30 stocks belonging to the current composition of Dow Jones Index, we get their Open, High, Low, Close prices (as retrieved from Google Finance) in the following way:

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 % create an empty array for storing stock data stockd={}; % scan the Dow Jones tickers and fetch the data from Quandl.com for i=1:length(dowjc) for j=1:length(quandlc) if(strcmp(dowjc{i},quandlc{j})) fprintf('%4.0f %s\n',i,quandlc{j}); % fetch the data of the stock from Quandl % using recommanded Quandl's command and % saving them directly into Matlab's FTS object (fts) fts=0; [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ... 'authcode','yourauthenticationtoken',... 'start_date',date1,'end_date',date2); stockd{i}=fts; % entire FTS object in an array's cell end end end % use 'save' command to save all actual variables from memory on the disc, % if you need it for multiple code running % save data

A code in line 35 to 37 is an example how you call Quandi for data and fetch them into FINTS (financial time series object from the Financial Toolbox in Matlab). So easy. If you wish to change the time frequency, apply transformation or get the data not in FINTS but .csv format, all is available for you as a parameter at the level of calling Quandl.get() command (explore all possibilities here).

Having that, you can easily plot the Google Finance data, for instance, of IBM stock straight away:

% diplay High-Low Chart for IBM highlow(stockd{11}) title('IBM'); ylabel('Stock Price ($)'); to get: or prepare a return-risk plot for your further portfolio optimisation ret=[]; for i=1:length(stockd) % extract the Close Price tmp=fts2mat(stockd{i}.Close,0); % calculate a vector of daily returns r=tmp(2:end)./tmp(1:end-1)-1; ret=[ret r]; end % calculate the annualized mean returns for 30 stocks mret=mean(ret)*sqrt(250); % and corresponding standard deviations msig=std(ret)*sqrt(250); % plot return-risk diagram p=100*[msig; mret]' plot(p(:,1),p(:,2),'b.'); xlim([0 30]); xlabel('Annualized Risk of Stock (%)') ylabel('Annualized Mean Return (%)'); providing us with Forum: Further Discussions You can discuss the current post with other quants within a new Quant At Risk Forum (Data Handling and Pre-Processing section). ## Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting Within the evolution of Mathworks’ MATLAB programming environment, finally, in the most recent version labelled 2013a we received a longly awaited line-command facilitation for pulling stock data directly from the Yahoo! servers. What does that mean for quants and algo traders? Honestly, a lot. Now, simply writing a few commands we can have nearly all what we want. However, please keep in mind that Yahoo! data are free therefore not always in one hundred percent their precision remains at the level of the same quality as, e.g. downloaded from Bloomberg resources. Anyway, just for pure backtesting of your models, this step introduces a big leap in dealing with daily stock data. As usual, we have a possibility of getting open, high, low, close, adjusted close prices of stocks supplemented with traded volume and the dates plus values of dividends. In this post I present a short example how one can retrieve the data of SPY (tracking the performance of S&P500 index) using Yahoo! data in a new Matlab 2013a and I show a simple code how one can test the time period of buying-holding-and-selling SPY (or any other stock paying dividends) to make a profit every time. The beauty of Yahoo! new feature in Matlab 2013a has been fully described in the official article of Request data from Yahoo! data servers where you can find all details required to build the code into your Matlab programs. Model for Dividends It is a well known opinion (based on many years of market observations) that one may expect the drop of stock price within a short timeframe (e.g. a few days) after the day when the stock’s dividends have been announced. And probably every quant, sooner or later, is tempted to verify that hypothesis. It’s your homework. However, today, let’s look at a bit differently defined problem based on the omni-working reversed rule: what goes down, must go up. Let’s consider an exchange traded fund of SPDR S&P 500 ETF Trust labelled in NYSE as SPY. First, let’s pull out the Yahoo! data of adjusted Close prices of SPY from Jan 1, 2009 up to Aug 27, 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 % Yahoo! Stock Data in Matlab and a Model for Dividend Backtesting % (c) 2013 QuantAtRisk.com, by Pawel Lachowicz close all; clear all; clc; date_from=datenum('Jan 1 2009'); date_to=datenum('Aug 27 2013'); stock='SPY'; adjClose = fetch(yahoo,stock,'adj close',date_from,date_to); div = fetch(yahoo,stock,date_from,date_to,'v') returns=(adjClose(2:end,2)./adjClose(1:end-1,2)-1); % plot adjusted Close price of and mark days when dividends % have been announced plot(adjClose(:,1),adjClose(:,2),'color',[0.6 0.6 0.6]) hold on; plot(div(:,1),min(adjClose(:,2))+10,'ob'); ylabel('SPY (US$)'); xlabel('Jan 1 2009 to Aug 27 2013');

and visualize them:

Having the data ready for backtesting, let’s look for the most profitable period of time of buying-holding-and-selling SPY assuming that we buy SPY one day after the dividends have been announced (at the market price), and we hold for $dt$ days (here, tested to be between 1 and 40 trading days).

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 % find the most profitable period of holding SPY (long position) neg=[]; for dt=1:40   buy=[]; sell=[]; for i=1:size(div,1) % find the dates when the dividends have been announced [r,c,v]=find(adjClose(:,1)==div(i,1)); % mark the corresponding SPY price with blue circle marker hold on; plot(adjClose(r,1),adjClose(r,2),'ob'); % assume you buy long SPY next day at the market price (close price) buy=[buy; adjClose(r-1,1) adjClose(r-1,2)]; % assume you sell SPY in 'dt' days after you bought SPY at the market % price (close price) sell=[sell; adjClose(r-1-dt,1) adjClose(r-1-dt,2)]; end   % calculate profit-and-loss of each trade (excluding transaction costs) PnL=sell(:,2)./buy(:,2)-1; % summarize the results neg=[neg; dt sum(PnL<0) sum(PnL<0)/length(PnL)];   end

If we now sort the results according to the percentage of negative returns (column 3 of neg matrix), we will be able to get:

>> sortrows(neg,3)   ans = 18.0000 2.0000 0.1111 17.0000 3.0000 0.1667 19.0000 3.0000 0.1667 24.0000 3.0000 0.1667 9.0000 4.0000 0.2222 14.0000 4.0000 0.2222 20.0000 4.0000 0.2222 21.0000 4.0000 0.2222 23.0000 4.0000 0.2222 25.0000 4.0000 0.2222 28.0000 4.0000 0.2222 29.0000 4.0000 0.2222 13.0000 5.0000 0.2778 15.0000 5.0000 0.2778 16.0000 5.0000 0.2778 22.0000 5.0000 0.2778 27.0000 5.0000 0.2778 30.0000 5.0000 0.2778 31.0000 5.0000 0.2778 33.0000 5.0000 0.2778 34.0000 5.0000 0.2778 35.0000 5.0000 0.2778 36.0000 5.0000 0.2778 6.0000 6.0000 0.3333 8.0000 6.0000 0.3333 10.0000 6.0000 0.3333 11.0000 6.0000 0.3333 12.0000 6.0000 0.3333 26.0000 6.0000 0.3333 32.0000 6.0000 0.3333 37.0000 6.0000 0.3333 38.0000 6.0000 0.3333 39.0000 6.0000 0.3333 40.0000 6.0000 0.3333 5.0000 7.0000 0.3889 7.0000 7.0000 0.3889 1.0000 9.0000 0.5000 2.0000 9.0000 0.5000 3.0000 9.0000 0.5000 4.0000 9.0000 0.5000

what simply indicates at the most optimal period of holding the long position in SPY equal 18 days. We can mark all trades (18 day holding period) in the chart:

where the trade open and close prices (according to our model described above) have been marked in the plot by black and red circle markers, respectively. Only 2 out of 18 trades (PnL matrix) occurred to be negative with the loss of 2.63% and 4.26%. The complete distribution of profit and losses from all trades can be obtained in the following way:

47 48 49 50 figure(2); hist(PnL*100,length(PnL)) ylabel('Number of trades') xlabel('Return (%)')

returning

Let’s make some money!

The above Matlab code delivers a simple application of the newest build-in connectivity with Yahoo! server and the ability to download the stock data of our interest. We have tested the optimal holding period for SPY since the beginning of 2009 till now (global uptrend). The same code can be easily used and/or modified for verification of any period and any stock for which the dividends had been released in the past. Fairly simple approach, though not too frequent in trading, provides us with some extra idea how we can beat the market assuming that the future is going to be/remain more or less the same as the past. So, let’s make some money!

## Rebinning of Financial Time-Series

Working with financial time-series, especially in trading and its following analysis of data trends or so on, we wish to rebin (resample) our data into a new time-series which would provide us with some sort of a new information on the average value of the underlying data records characterized by high volatility in time.

The following algorithm in Maltab does the job pretty well:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 % Function rebins any time-series [x,y] where x is a time vector % and y stores time-series values from the current (regular or % irregular samping). A new time-series sampling is dt. % % Input: x : time vector [1xN] % y : data vector [1xN] % dy : data error vector [1xN] % dt : new rebinning time (dt > current sampling) % % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz % % Example: x=1:1:1001; % assume 1001 trading days % y=rand(1,1001); % simulate pseudo data % dy=0.1*rand(1,1001); % simulate pseudo data errors (if required) % dt=50; % rebin data from 1 day sampling down to 50 day intervals % [rx,ry]=rebin(x,y,dy,dt); % or [rx,ry]=rebin(x,y,[],dt]; % plot(x,y,'color',[0.7 0.7 0.7]); % plot original time-series % hold on; plot(rx,ry,'ro-'); % plot down-rebined time-series   function [rx,ry]=rebin(x,y,dy,dt) if(isempty(dy)) dy=ones(1,length(y)); end rdata=[]; j=1; k=1; t2=x(1); while(j<=length(x)) i=j; if((x(i)+dt)>x(end)) break else t2=t2+dt; end i=j; sa=0; wa=0; ea=0; il=0; while(x(i)<t2) sa=sa+(y(i)/dy(i)^2); wa=wa+(1/dy(i)^2); i=i+1; il=il+1; end ry=sa/wa; rx=t2-dt; if(il>=1) rdata=[rdata; rx ry]; end j=j+il; end rx=rdata(:,1); ry=rdata(:,2); end

Example

As an example we will use the daily rate of returns in trading of AAPL (Apple Inc.) from Jan-1984 to Apr-2011 (download Matlab’s aaplr.mat M-file). Below, the short set of command lines allow us to execute the rebinning process of return series,

clear all; close all; clc;   load aaplr.mat plot(aaplR)   x=1:length(aaplR); % 6855 days y=aaplR; plot(x,y,'color',[0.7 0.7 0.7]);   % rebin the data with dt=25 days step (one trading month) dt=25; dy=[]; % we have no information on data errors   [rx,ry]=rebin(x,y,dy,dt)   % overplot results and display first 500 days hold on; plot(rx,ry,'ro-'); xlim([0 500]); ylim([-0.125 0.15]);

and plot both time-series, original and rebinned with a new bin time of 25 days for the first 500 days of trading:

where a red line denotes a new rebinned data time-series with a binning time of 25 trading days. The function (the algorithm) computes a simple weighted mean based on data points falling into an interval of $\langle t,t+dt \rangle$. If we do not specify the input data error vector of $dy$ as in the example above, we should get a simple mean as a consequence.

## Trend Identification for FX Traders (Part 2)

In my previous post I provided an introduction to the trading model invention and design. We made use of FX data of AUDUSD pair sampled hourly and splitting data into weekly time-series.

Today, we will work a bit harder over formulation of the very first rules for the model. This step will require an engagement of our creativity in understanding what data are like as well as how we can make a basic set of rules which would help us to perform an attractive time-series classification. Our objective is to invent a method which will be helpful in classification of last week FX pair’s time-series to be either in the downtrend or in the uptrend.

The most naive way of classification of directional information contained in any time-series is its slope: a straight line fit to the data. Let’s use it as our starting point. Instead of fitting all data points for a given week, we find median values for the first and the last 12 data points both in Time $(x1,x2)$ and Pair Ratio $(y1,y2)$ as specified in lines 92 to 94:

65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 % FX time-series analysis % (c) Quant at Risk, 2012 % % Part 2: Classification of weekly time-series   close all; scrsz = get(0,'ScreenSize'); h=figure('Position',[70 scrsz(4)/2 scrsz(3)/1.1 scrsz(4)/2],'Toolbar','none'); fprintf('\nuptrend/downtrend identification.. '); % for viewing use the loop hold off; set(0,'CurrentFigure',h);   % pre-define variables trend=zeros(nw,1); slope=zeros(nw,1); midp={}; % middle points endp={}; % end points (median based on last 12 points)   for i=1:nw %--- a loop over total number of weeks available   % reading time-series for a current week w=week{i}; x=w(:,1); y=w(:,2);   % plot the time-series hold on; plot(x,y,'k');   % linear trend estimation x1=median(x(1:12)); x2=median(x(end-11:end)); y1=median(y(1:12)); y2=median(y(end-11:end));   % define end-point of the time-series and mark it on the plot endp{i}=[x2 y2]; hold on; plot(endp{i}(1),endp{i}(2),'b*');   % find slope m=(y2-y1)/(x2-x1); slope(i)=m; xl=x1:dt:x2; yl=m*xl-m*x2+y2; % a line of representing the slope hold on; plot(xl,yl,'b:');   % find middle point of the line and mark it on the plot mx=mean(xl); my=mean(yl); midp{i}=[mx my]; hold on; plot(midp{i}(1),midp{i}(2),'bo');

As an example of the code execution, for the first two weeks we plot slopes, mid-points and end-points:

We assume that our classification procedure will be based solely on the information provided for end-points and slopes. The exception we make for the classification of the first two weeks. For the first week the distinction between uptrend and downtrend is made based on the position of the first and last point:

113 114 115 116 117 118 119 120 121 122 123 124 % Time-Series Classification   if(i==1) ybeg=y(1); yend=y(end); if(ybeg<yend) trend(i)=+1; % uptrend hold on; plot(x,y,'k'); else trend(i)=-1; % downtrend hold on; plot(x,y,'r'); end end

where we mark the result of classification with a sign $+1$ or $-1$ in a vector element of $trend$, and plot it with black and red color denoting uptrend and downtrend, respectively.

For the second week, our rules of classification are enriched by additional information about the end-point of a current and a previous week:

126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 if(i==2) % week(current-1) tmp=week{i-1}; x1=tmp(:,1); y1=tmp(:,2); y1b=y1(1); y1e=y1(end); % week(current) y0b=y(1); y0e=y(end); if(y0e>y1e) trend(i)=+1; % uptrend hold on; plot(x,y,'k'); else trend(i)=-1; % downtrend hold on; plot(x,y,'r'); end end

For weeks number 3 and higher we do our creative research over the data to define a specific set of rules. We allow to take into account the information from two weeks prior to the current one and combine them all together. The following code represents an attractive solution, subject to improvement:

142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 if(i>2) % week(current-2) mid2=midp{i-2}(2); end2=endp{i-2}(2); slp2=slope(i-2); % week(current-1) mid1=midp{i-1}(2); end1=endp{i-1}(2); slp1=slope(i-1); % week(current) mid0=midp{i}(2); end0=endp{i}(2); slp0=slope(i); if((mid0>mid1)) % up-trend if((mid0>mid2)&&(end0>end1)) trend(i)=+1; hold on; plot(x,y,'k'); % strong up-trend elseif((mid0>mid2)&&(end0>end2)) trend(i)=+1; hold on; plot(x,y,'k'); % weak up-trend elseif((mid0<mid2)&&(end0<end2)&&(slp0<0)) trend(i)=-1; hold on; plot(x,y,'r'); % turns into possible down-trend elseif((mid0<mid2)&&(end0<end2)&&(slp0>0)) trend(i)=+1; hold on; plot(x,y,'k'); % turns into possible up-trend else trend(i)=+1; hold on; plot(x,y,'k'); % turns into possible up-trend end elseif(mid0<mid1) % down-trend if((mid0<mid2)&&(end0<end1)&&(end0<end2)) trend(i)=-1; hold on; plot(x,y,'r'); % weak down-trend elseif((mid0<mid2)&&(end0<end2)&&(end0>end1)) trend(i)=+1; hold on; plot(x,y,'k'); % possible up-trend elseif((mid0<mid2)&&(end0>end2)) trend(i)=+1; hold on; plot(x,y,'k'); % turns into possible up-trend elseif((mid0>mid2)&&(end0<end1)&&(end0<end2)) trend(i)=-1; hold on; plot(x,y,'r'); elseif((mid0>mid2)&&(end0>end2)) trend(i)=+1; hold on; plot(x,y,'k'); % turns into possible up-trend elseif((mid0>mid2)&&(end0>end1)) trend(i)=+1; hold on; plot(x,y,'k'); % turns into possible up-trend else trend(i)=-1; hold on; plot(x,y,'r'); end end end end

Since one picture is worth millions of lines of code, below we present three examples of our model in action. The last plot corresponds to the latest Global Financial Crisis and shows how weeks in uptrends of 2009 followed these in downtrend a year before.

It is straightforward to note that the performance of our rules works very intuitively and stays in general agreement with the market sentiments.

## Accessing a SQL Server database in Matlab and the use of FTS objects

Feeding our models with data for backtesting purposes requires pulling data from the server. If we do our modeling part in Matlab under Linux and the MS SQL Server database is our target, the quickest solution for the establishment of a good connectivity is making use of Java Database Connectivity (JDBC) driver. Firstly, visit the following web page to download and current driver (.tar.gz) and next follow the installation steps provided by Mathworks.

In the following example we will try to connect to our database called ‘XDataBase’ and retrieve stock prices from Feb 2, 1998 of Apple Inc. (NASDAQ ticker: AAPL).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ticker=['AAPL']; start=['02-Feb-1998'];   % --Connect to your local SQL Server and fetch the data for a ticker % % define connection conn=database('XDataBase','matlab','m4tl4b','com.microsoft.sqlserver.jdbc.SQLServerDriver','jdbc:sqlserver://163.118.2.7:1533;database=XDataBase');   % specify return format setdbprefs('DataReturnFormat','structure');   % define your SQL query, as for example: QueryString = ['SELECT sec.tic, dp.datadate, dp.prchd As PriceH, dp.prcld As PriceL, dp.prccd As PriceC, dp.prccd/dp.ajexdi As PriceCA, dp.cshtrd*dp.ajexdi As Vol FROM Xpressfeed.dbo.security AS sec INNER JOIN Xpressfeed.dbo.sec_dprc AS dp ON sec.gvkey = dp.gvkey AND sec.iid = dp.iid AND dp.datadate >=''',start,''' WHERE (sec.tic IN (''',ticker,'''))'];   % execute connection, fetch data, and assign them to a Matlab variable cur = exec(conn, QueryString); myData = fetch(cur); secData = myData.data;

In the above example, our structure variable of secData is composed of secData.PriceH, secData.PriceL, secData.PriceC, secData.PriceCA, and secData.Vol arrays containing information about AAPL’s High, Low, Close, Adjusted Close prices and Volume traded, respectively.

Making use of Matlab’s Financial Toolbox, one can store all these data into a useful text-like file format. We do it by construction of Financial Time-Series (FTS) object:

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 % verify the existence of data for a ticker skip=false; try n=length(secData.PriceC); % a random test catch skip=true; end if(~skip) dates=secData.datadate; dates=datenum(dates); data=[zeros(n,1) secData.PriceH secData.PriceL secData.PriceC secData.PriceCA secData.Vol]; colheads={'DATES';'OPEN';'HIGH';'LOW';'CLOSE';'CLOSEA';'VOLUME'}; desc=ticker; filename=[ticker,'.dat']; % save AAPL's extracted Price and Volume data into AAPL.dat file fts2ascii(filename,dates,data,colheads,desc);   % construct FTS object by reading the data from the file fts=ascii2fts(filename,1,2); data=fts2mat(fts,1); % and plot the Adjusted Close Price scrsz = get(0,'ScreenSize'); hh=figure('Position',[70 scrsz(4)/2 scrsz(3)/1.1 scrsz(4)/2],'Toolbar','none'); plot(data(:,1),data(:,5)); xlim([min(data(:,1)) max(data(:,1))]); title(ticker); xlabel('Time [days]'); ylabel('Adjusted Close Price [US$]'); end As the result of execution of code lines 28 to 35, we were able to retrieve AAPL stock data and save them into AAPL.dat file. First ten lines of the file are as follows: $ head -10 AAPL.txt AAPL DATES OPEN HIGH LOW CLOSE CLOSEA VOLUME 02-Feb-1998 0.000000 18.500000 17.375000 17.688000 4.422000 22740000 03-Feb-1998 0.000000 18.625000 17.688000 18.313000 4.578250 14380000 04-Feb-1998 0.000000 18.500000 18.000000 18.250000 4.562500 6078400 05-Feb-1998 0.000000 18.500000 18.000000 18.313000 4.578250 8508000 06-Feb-1998 0.000000 18.688000 18.250000 18.500000 4.625000 7228000 09-Feb-1998 0.000000 19.500000 18.375000 19.188000 4.797000 17668000 10-Feb-1998 0.000000 19.563000 19.063000 19.438000 4.859500 15072000 11-Feb-1998 0.000000 19.500000 18.875000 19.000000 4.750000 7560000

In line 38 we show how to construct FTS object

>> whos fts Name Size Bytes Class Attributes   fts 3366x6 190016 fints

whereas line 39 provides us with a transformation of the object’s data into matrix format that we may use to plot AAPL’s Adjusted Close Prices.

It is of great importance to remind that Financial Toolbox allows us to express a date (e.g. 11-Feb-1998) as a continuous number:

>> datenum('02-Feb-1998') ans = 729788

or reversely

>> datestr(ans) ans = 02-Feb-1998

## Extracting Time-Series from Tick-Data

The tick-data provided in the .csv (comma separated values) file format sometimes may be a real problem to handle quickly, especially when the total size starts to count in hundreds of GB. If your goal is to extract a time-series with, say, hourly time resolution only, this article will provide you with some fresh and constructive guidelines how to do it smartly in the Linux environment.

First, let’s have a closer look at the data. Say, we have a collection of 2148 .csv files hosting the FX trading history of AUDUSD pair, covering nearly 10 years between 2000 and 2010. Each file is 7.1 MB large what leaves us with approximately 15 GB of data to process. Having a look into the randomly selected file we can identify the header and data themselves:

$head -10 audusd_216.csv Ticks,TimeStamp,Bid Price,Bid Size,Ask Price,Ask Size 632258349000000015,2004-07-19 11:55:00.000,0.7329,1000000,0.7334,1000000 632258349000000016,2004-07-19 11:55:00.000,0.7329,1000000,0.7334,1000000 632258349000000017,2004-07-19 11:55:00.000,0.7329,1000000,0.7333,1000000 632258349000000018,2004-07-19 11:55:00.000,0.7327,1000000,0.7333,1000000 632258349000000019,2004-07-19 11:55:00.000,0.7327,1000000,0.7333,1000000 632258349000000020,2004-07-19 11:55:00.000,0.7328,1000000,0.7333,1000000 632258349000000021,2004-07-19 11:55:00.000,0.7328,1000000,0.7334,1000000 632258349600000000,2004-07-19 11:56:00.000,0.7328,1000000,0.7334,1000000 632258349600000001,2004-07-19 11:56:00.000,0.7328,1000000,0.7336,1000000 Our aim will be to extract Bid and Ask Price time-series. We will make use of a few Linux standard tools, e.g. sed, awk, supplemented with extra f77 codes. It is also to demonstrate how shell programming can be useful while we have an opportunity to explore the enigmatic syntax of its programs. Generally, we will be writing a shell script, executable for any FX pair name, e.g. gbpnzd, eurjpy, and so on. In the first step of the script we create a list of all files. This is tricky in Linux as the standard command of ‘ls -lotr’ though returns a desired list but also all details on the file size, attributes, etc. which we do not simply want. Lines 9 and 10 solve the problem, 1 2 3 4 5 6 7 8 9 10 # Extracting Time-Series from Tick-Data .csv files # (c) Quant at Risk, 2012 # # Exemplary usage: ./script.src audusd #!/bin/bash echo "..making a sorted list of .csv files" for i in$1_*.csv; do echo ${i##$1_} $i${i##.csv}; done | sort -n | awk '{print $2}' >$1.lst

and a newly create file of \$1.lst (note: \$1 corresponds in the shell script to the parameter’s name we called the script with, e.g. audusd; therefore \$1.lst physically means audusd.lst) contains the list: audusd_1.csv audusd_2.csv audusd_3.csv ... audusd_2148.csv We create one data file from all 2148 pieces by creating and executing an in-line script: 12 13 14 15 16 17 echo "..creating one data file" awk '{print "cat",$1," &gt;&gt; tmp.lst"}' $1.lst &gt; tmp.cmd chmod +x tmp.cmd ./tmp.cmd rm tmp.cmd mv tmp.lst$1.tmp

Now, \$1.tmp is a 15 GB file and we may wish to remove some unnecessary comments and tokens: 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 echo "..removing comments" sed 's/Ticks,TimeStamp,Bid Price,Bid Size,Ask Price,Ask Size//g'$1.tmp > $1.tmp2 rm$1.tmp   echo "..removing empty lines" sed '/^$/d'$1.tmp2 > $1.tmp rm$1.tmp2   echo "..removing token ," sed 's/,/ /g' $1.tmp >$1.tmp2 rm $1.tmp echo "..removing token :" sed 's/:/ /g'$1.tmp2 > $1.tmp rm$1.tmp2   echo "..removing token -" sed 's/-/ /g' $1.tmp >$1.tmp2 rm $1.tmp echo "..removing column with ticks and ask/bid size" awk '{print$2,$3,$4,$5,$6,$7,$8,$10}'$1.tmp2 > $1.tmp rm$1.tmp2

In order to convert a time information into a continuous measure of time, we modify the f77 code for our task as follows:

c Extracting Time-Series from Tick-Data .csv files c (c) Quant at Risk, 2012 c c Program name: fx_getmjd.for c Aim: removes ticks and coverts trade time into MJD time [day] c Input data format: YYYY MM DD HH MM SS.SSS BID BID_Vol ASK ASK_Vol   implicit double precision (a-z) integer y,m,d,hh,mm,jd integer*8 bidv,askv character zb*50   call getarg(1,zb) open(1,file=zb) do i=1.d0, 500.d6 read(1,*,end=1) y,m,d,hh,mm,ss,bid,ask jd= d-32075+1461*(y+4800+(m-14)/12)/4+367*(m-2-(m-14)/12*12) _ /12-3*((y+4900+(m-14)/12)/100)/4 mjd=(jd+(hh-12.d0)/24.d0+mm/1440.d0+ss/86400.d0)-2400000.5d0 mjd=mjd-51544.d0 ! T_0 = 2000.01.01 00:00 abr=ask/bid write(*,2) mjd,bid,ask,abr enddo 1 close(1) 2 format(F15.8,F8.4,F8.4,F12.6)   end

and execute it in our script:

43 44 45 echo "..changing a date to MJD" fx_getmjd $1.tmp >$1.dat rm $1.tmp In the aforementioned f77 code, we set a zero time point (MJD=0.00) on Jan 1, 2000 00:00. Since that day, now our time is expressed as a single column measuring time progress in days with fractional parts tracking hours and minutes. We may split the data into two separate time-series containing Bid and Ask Prices at the tick-data level: 47 48 49 echo "..splitting into bid/ask/abr files" awk '{print$1,$2}'$1.dat > $1.bid awk '{print$1,$3}'$1.dat > $1.ask A quick inspection of both files reveals we deal with nearly$500\times 10^6$lines! Before we reach our chief aim, i.e. rebinning the series with 1-hour time resolution, there is a need to, unfortunately, separate input into 5 parts, each of maximum$100\times 10^6$lines. The latter may vary depending of RAM memory size available, and if sufficient, this step can be even skipped. We proceed: 51 52 53 54 55 56 echo "..spliting bid/ask/abr into separate files" fx_splitdat$1 1 fx_splitdat $1 2 fx_splitdat$1 3 fx_splitdat $1 4 fx_splitdat$1 5

where fx_splitdat.for code is given as follows:

c Extracting Time-Series from Tick-Data .csv files c (c) Quant at Risk, 2012 c c Program name: fx_splitdat.for c Exemplary usage: ./fx_splitdat audusd [1,2,3,4,5]   implicit double precision (a-z) integer nc character*6 zb character*10 zbask,zbbid character*16 zb1ask,zb2ask,zb3ask,zb4ask,zb5ask character*16 zb1bid,zb2bid,zb3bid,zb4bid,zb5bid character*1 par2,st   c zb- name of length equal 6 characters only call getarg(1,zb) call getarg(2,par2) ! case   write(st,'(a1)') par2 read(st,'(i1)') nc   zbask=zb(1:6)//'.ask' zbbid=zb(1:6)//'.bid'   zb1ask=zb(1:6)//'.ask.part1' zb2ask=zb(1:6)//'.ask.part2' zb3ask=zb(1:6)//'.ask.part3' zb4ask=zb(1:6)//'.ask.part4' zb5ask=zb(1:6)//'.ask.part5'   zb1bid=zb(1:6)//'.bid.part1' zb2bid=zb(1:6)//'.bid.part2' zb3bid=zb(1:6)//'.bid.part3' zb4bid=zb(1:6)//'.bid.part4' zb5bid=zb(1:6)//'.bid.part5'   open(11,file=zbask) open(12,file=zbbid)   if(nc.eq.1) then open(21,file=zb1ask) open(22,file=zb1bid) do i=1.d0, 100.d6 read(11,*,end=1) mjd_ask,dat_ask read(12,*,end=1) mjd_bid,dat_bid if((i>=1.0).and.(i<100000001.d0)) then write(21,2) mjd_ask,dat_ask write(22,2) mjd_bid,dat_bid endif enddo endif   if(nc.eq.2) then open(31,file=zb2ask) open(32,file=zb2bid) do i=1.d0, 200.d6 read(11,*,end=1) mjd_ask,dat_ask read(12,*,end=1) mjd_bid,dat_bid if((i>=100000001.d0).and.(i<200000001.d0)) then write(31,2) mjd_ask,dat_ask write(32,2) mjd_bid,dat_bid endif enddo endif   if(nc.eq.3) then open(41,file=zb3ask) open(42,file=zb3bid) do i=1.d0, 300.d6 read(11,*,end=1) mjd_ask,dat_ask read(12,*,end=1) mjd_bid,dat_bid if((i>=200000001.d0).and.(i<300000001.d0)) then write(41,2) mjd_ask,dat_ask write(42,2) mjd_bid,dat_bid endif enddo endif   if(nc.eq.4) then open(51,file=zb4ask) open(52,file=zb4bid) do i=1.d0, 400.d6 read(11,*,end=1) mjd_ask,dat_ask read(12,*,end=1) mjd_bid,dat_bid if((i>=300000001.d0).and.(i<400000001.d0)) then write(51,2) mjd_ask,dat_ask write(52,2) mjd_bid,dat_bid endif enddo endif   if(nc.eq.5) then open(61,file=zb5ask) open(62,file=zb5bid) do i=1.d0, 500.d6 read(11,*,end=1) mjd_ask,dat_ask read(12,*,end=1) mjd_bid,dat_bid if((i>=400000001.d0).and.(i<500000001.d0)) then write(61,2) mjd_ask,dat_ask write(62,2) mjd_bid,dat_bid endif enddo endif   1 close(1) 2 format(F15.8,F8.4)   stop end

and compiling it as usual:

f77 fx_splitdat.for -o fx_splitdat

Finally, we can extract the rebinned Bid and Ask Price time-series with bin time of 1 hour, i.e. $dt=0.041666667$ d, making use of the following f77 code:

c Extracting Time-Series from Tick-Data .csv files c (c) Quant at Risk, 2012 c c Program name: fx_rebin.for c Exemplary usage: ./fx_rebin audusd 2   implicit double precision (a-z) parameter (dim=100.d6)   double precision f(dim), mjd(dim), step character*50 par1, par2, st   call getarg(1,par1) ! file name call getarg(2,par2) ! bining [d]   write(st,'(a20)') par2 read(st,'(f20)') step   c reading data open(1,file=par1) do i=1,100.d6 read(1,*,end=1) _ mjd(i),f(i) enddo 1 close(1) n=i-1.d0   c main loop j=1.d0 k=1.d0 t2=0. t2=dint(mjd(j)) do while (j.lt.n) i=j if ((mjd(i)+step).gt.(mjd(n))) then print* stop else t2=t2+step endif i=j il=0.d0 s=0.d0 do while (mjd(i).lt.t2) s=s+f(i) i=i+1.d0 il=il+1.d0 ! how many points in segment enddo av=s/il day=t2-step if (il.ge.1.d0) then write(*,3) day,av endif j=j+il enddo   2 format(f30.7,2f30.6) 3 format(f20.8,f8.4)   10 stop end

executed in our script for all five part of both tick-data time-series:

58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 echo "..rebinning with dt = 1 h" dt=0.041666667 fx_rebin $1.ask.part1$dt > $1.ask.part1.1h fx_rebin$1.ask.part2 $dt >$1.ask.part2.1h fx_rebin $1.ask.part3$dt > $1.ask.part3.1h fx_rebin$1.ask.part4 $dt >$1.ask.part4.1h fx_rebin $1.ask.part5$dt > $1.ask.part5.1h fx_rebin$1.bid.part1 $dt >$1.bid.part1.1h fx_rebin $1.bid.part2$dt > $1.bid.part2.1h fx_rebin$1.bid.part3 $dt >$1.bid.part3.1h fx_rebin $1.bid.part4$dt > $1.bid.part4.1h fx_rebin$1.bid.part5 $dt >$1.bid.part5.1h   echo "..appending rebinned files" cat $1.ask.part1.1h$1.ask.part2.1h $1.ask.part3.1h$1.ask.part4.1h $1.ask.part5.1h >$1.ask.1h.tmp cat $1.bid.part1.1h$1.bid.part2.1h $1.bid.part3.1h$1.bid.part4.1h $1.bid.part5.1h >$1.bid.1h.tmp rm *part*   echo "..removing empty lines in rebinned files" sed '/^$/d'$1.ask.1h.tmp > $1.ask.1h rm$1.ask.1h.tmp sed '/^$/d'$1.bid.1h.tmp > $1.bid.1h rm$1.bid.1h.tmp   echo "..done!"

As the final product we obtain two files, say,

audusd.bid.1h audusd.ask.1h

of the content:

$tail -5 audusd.bid.1h 3772.62500304 0.9304 3772.66666970 0.9283 3772.70833637 0.9266 3772.75000304 0.9263 3772.79166970 0.9253 where the first column is time (MJD) with 1-hour resoulution and the second column contains a simple Price average from all tick prices falling into 1-hour time bin. As for dessert, we plot both Bid and Ask Price time-series what accomplishes our efforts: ## Bank format of numbers in Matlab The following Matlab function allows for displaying any double-type number using a nice bank format that includes commas separators and two digit precision (without rounding). The number$num\$ is converted into a new variable of a string type.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 function [str]=bankformat(num) num2=floor(num*1000)/1000; r=int32(100*(num-floor(num))); str = num2str(num2); k=find(str == '.'); if(isempty(k)) str=[str,'.00']; end FIN = min(length(str),find(str == '.')-1); for i = FIN-2:-3:2 str(i+1:end+1) = str(i:end); str(i) = ','; end x=mod(r,10); if(x==0) str=[str,'0']; end k=find(str == '.'); d=length(str)-k; if(d>2) str=str(1:end-(d-2)); end end

Example:

>> n=70149506.5; >> s=bankformat(n); >> fprintf('\nInvested capital: £\%1s\n',s)   Invested capital: £70,149,506.50