Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Pre-Processing of Asset Price Series for Portfolio Optimization

Portfolio Optimization is a significant component of Matlab’s Financial Toolbox. It provides us with ready-to-use solution in finding optimal weights of assets that we consider for trading deriving them based on the historical asset performance. From a practical point of view, we can include it in our algorithmic trading strategy and backtest its applicability under different initial conditions. This is a subject of my next up-coming post. However, before we can enjoy the view from the peak, we need to climb the mountain first.

In Matlab, the portfolio is created as a dedicated object of the same name. It doesn’t read the raw stock data. We need to feed that beast. Two major ingredients satisfy the input: a vector of the expected asset returns and a covariance matrix. Matlab helps us to estimate these moments but first we need to deliver asset data in a digestable form.

In this post we will see how one can quickly download the stock data from the Internet based on our own stock selection and pre-process them for solving portfolio optimization problem in Matlab.

Initial Setup for Portfolio Object

Let’s say that at any point of time you have your own list of stocks you wish to buy. For simplicity let’s also assume that the list contains stocks traded on NYSE or NASDAQ. Since you have been a great fun of this game, now you are almost ready to buy what you jotted down on your ShoppingList.lst. Here, an example of 10 tech stocks:

AAPL   AOL   BIDU   GOOG   HPQ   IBM   INTC   MSFT   NVDA   TXN

They will constitute your portfolio of stocks. The problem of portfolio optimization requires a look back in time in the space of returns obtained in trading by each stock. Based on them the Return Proxy and Risk Proxy can be found.

The return matrix $R$ of dimensions $(N-1)\times M$ where $N$ stands for number of historical prices (e.g. derived daily, or monthly, etc.) and $M$ for the number of stocks in our portfolio, is required by Matlab as an input. We will see how does it work in next post. For now let’s solely focus on creation of this matrix.

In the article Create a Portfolio of Stocks based on Google Finance Data fed by Quandl I discussed Quandl.com as an attractive data provider for US stocks. Here, we will follow this solution making use of Quandl resources to pull out the stock price series for our shopping list. Ultimately, we aim at building a function, here: QuandlForPortfolio, that does the job for us:

% Pre-Processing of Asset Price Series for Portfolio Optimization in Matlab
%  (c) 2013, QuantAtRisk.com, by Pawel Lachowicz
 
clear all; close all; clc;
 
% Input Parameters
n=1*365;
tickers='ShoppingList.lst';
qcodes='QuandlStockCodeListUS.xlsx';
 
[X,Y,R,AssetList] = QuandlForPortfolio(n,tickers,qcodes);

We call this function with three input parameters. The first one, $n$, denotes a number of calendar days from today (counting backwards) for which we wish to retrieve the stock data. Usually, 365 days will correspond to about 250$-$252 trading days. The second parameter is a path/file name to our list of stock (desired to be taken into account in the portfolio optimisation process) while the last input defines the path/file name to the file storing stocks’ tickers and associated Quandl Price Codes (see here for more details).

Feeding the Beast

The QuandlForPortfolio Matlab function is an extended version of the previously discussed solution. It contains an important correcting procedure for the data fetched from the Quandl servers. First, let’s have a closer look on the function itself:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
% Function assists in fetching Google Finance data from the Quandl.com
%  server for a given list of tickers of stocks traded on NYSE or
%  NASDAQ. Data are retrieved for last 'n' days with daily sampling.
%
% INPUT
%   n       : number of calendar days from 'today' (e.g. 365 would
%             correspond to about 252 business days)
%   tickers : a path/file name of a text file listing tickers
%   qcodes  : a path/file name of Excel workbook (.xlsx) containing a list
%              of tickers and Quandl Price Codes in the format of
%              [Ticker,Stock Name,Price Code,Ratios Code,In Market?]
% OUTPUT
%   X0        : [Nx1] column vector with days
%   Y0        : [NxM] matrix with Close Prices for M stocks
%   R0        : [(N-1)xM] matrix of Retruns
%   AssetList : a list of tickers (cell array)
%
% (c) 2013, QuantAtRisk.com, by Pawel Lachowicz
 
function [X0,Y0,R0,AssetList0] = QuandlForPortfolio(n,tickers,qcodes)
    fileID = fopen(tickers);
    tmp = textscan(fileID, '%s');
    fclose(fileID);
    AssetList=tmp{1};  % a list as a cell array
 
    % Read in the list of tickers and internal Quandl codes
    %
    [~,text,~] = xlsread(qcodes);
    quandlc=text(:,1); % again, as a list in a cell array
    quandlcode=text(:,3); % corresponding Quandl's Price Code
 
    date1=datestr(today-n,'yyyy-mm-dd'); % from
    date2=datestr(today,'yyyy-mm-dd');   % to
 
    % Fetch the data from Quandl.com
    %
    QData={};
    for i=1:length(AssetList)
        for j=1:length(quandlc)
            if(strcmp(AssetList{i},quandlc{j}))
                fprintf('%4.0f %s\n',i,quandlc{j});
                fts=0;
                [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ...
                   'authcode','x',...
                   'start_date',date1,'end_date',date2,'collapse','daily');
                QData{i}=fts;
            end
        end
    end
 
    % Post-Processing of Fetched Data
    %
    % create a list of days across all tickers
    TMP=[];
    for i=1:length(QData)
        tmp=fts2mat(QData{i},1);
        tmp=tmp(:,1);
        TMP=[TMP; tmp];
    end
    ut=unique(TMP);
    % use that list to find these days that are not present
    %  among all data sets
    TMP=[];
    for i=1:length(QData)
        tmp=fts2mat(QData{i},1);
        tmp=tmp(:,1);
        TMP=[TMP; setdiff(ut,tmp)];
    end
    ut=unique(TMP);
    % finally, extract Close Prices from FTS object and store them
    %  in Y0 matrix, plus corresponding days in X0
    X0=[];
    Y0=[]; 
    for i=1:length(QData)
        tmp=fts2mat(QData{i},1);
        cp=[];
        for j=1:size(tmp,1)
            [r,~,~]=find(ut==tmp(j,1));
            if(isempty(r))
                cp=[cp; tmp(j,5)]; % column 5 corresponds to Close Price
                if(i<2)
                    % create a time column vector listing days
                    % common among all data sets
                    X0=[X0; tmp(j,1)];
                end
            end
        end
        Y0=[Y0 cp];
    end
    % transform Close Prices into Returns, R(i)=cp(i)/cp(i-1)-1
    R0=tick2ret(Y0);
    AssetList0=AssetList';
end

The main bottleneck comes from the fact that Matlab’s portfolio object demands an equal number of historical returns ($N-1$) in the matrix of $R$ for all $M$ assets. We design the function in the way that it sets the common timeframe for all stocks listed on our shopping list. Of course, we ensure that all stocks were traded in the markets for about $n$ last days (rough estimation).

Now, the timeframe of $n$ last days should be understood as a first approximation. We fetch the data from Quandl (numeric date, Open, High, Low, Close, Volume) and save them in the cell array QData (lines #37-49) for each stock separately as FTS objects (Financial Time-Series objects; see Financial Toolbox). However, it may occur that not every stock we fetched displays the same amount of data. That is why we need to investigate for what days and for what stocks we miss the data. We achieve that by scanning each FTS object and creating a unique list of all days for which we have data (lines #54-60).

Next, we loop again over the same data sets but now we compare that list with a list of all dates for each stock individually (lines #63-69), capturing (line #67) those dates that are missing. Their complete list is stored as a vector in line #69. Eventually, given that, we are able to compile the full data set (e.g. Close Prices; here line #80) for all stocks in our portfolio ensuring that we will include only those dates for which we have prices across all $M$ assets (lines #70-91).

Beast Unleashed

We test our data pre-processing simply by running the block of code listed above engaging QuandlForPortfolio function and we check the results in the Matlab’s command window as follows:

>> whos X Y R AssetList
  Name             Size            Bytes  Class     Attributes
 
  AssetList        1x10             1192  cell                
  R              250x10            20000  double              
  X              251x1              2008  double              
  Y              251x10            20080  double

what confirms the correctness of dimensions as expected.

At this stage, the aforementioned function can be used two-fold. First, we are interested in the portfolio optimisation and we look back at last $n$ calendar days since the most current one (today). The second usage is handy too. We consider our stocks on the shopping list and fetch for their last, say, $n=7\times365$ days with data. If all stocks were traded over past 7 years we should be able to collect a reach data set. If not, the function will adjust the beginning and end date to meet the initial time constrains as required for $R$ matrix construction. For the former case, we can use 7-year data sample for direct backtesting of algo models utilizing Portfolio Optimization.

Stay tuned as we will rock this land in the next post!

Any Questions?

Share them across QuantCove.com – the official Forum of QuantAtRisk.

  • barbour uk

    I constantly spent my half an hour to read this weblog’s articles every day along with a cup of coffee.

  • John

    Really great post! I was struggling with Matlab portfolio optimisation and you simply attacked the problem from the most practical side. Thanks!

  • Chris3

    The way how you explain the problems is stunning. With pleasure I follow your posts. Good job man!

Contact Form Powered By : XYZScripts.com