Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

# Pre-Processing of Asset Price Series for Portfolio Optimization

Portfolio Optimization is a significant component of Matlab’s Financial Toolbox. It provides us with ready-to-use solution in finding optimal weights of assets that we consider for trading deriving them based on the historical asset performance. From a practical point of view, we can include it in our algorithmic trading strategy and backtest its applicability under different initial conditions. This is a subject of my next up-coming post. However, before we can enjoy the view from the peak, we need to climb the mountain first.

In Matlab, the portfolio is created as a dedicated object of the same name. It doesn’t read the raw stock data. We need to feed that beast. Two major ingredients satisfy the input: a vector of the expected asset returns and a covariance matrix. Matlab helps us to estimate these moments but first we need to deliver asset data in a digestable form.

In this post we will see how one can quickly download the stock data from the Internet based on our own stock selection and pre-process them for solving portfolio optimization problem in Matlab.

Initial Setup for Portfolio Object

Let’s say that at any point of time you have your own list of stocks you wish to buy. For simplicity let’s also assume that the list contains stocks traded on NYSE or NASDAQ. Since you have been a great fun of this game, now you are almost ready to buy what you jotted down on your ShoppingList.lst. Here, an example of 10 tech stocks:

AAPL AOL BIDU GOOG HPQ IBM INTC MSFT NVDA TXN

They will constitute your portfolio of stocks. The problem of portfolio optimization requires a look back in time in the space of returns obtained in trading by each stock. Based on them the Return Proxy and Risk Proxy can be found.

The return matrix $R$ of dimensions $(N-1)\times M$ where $N$ stands for number of historical prices (e.g. derived daily, or monthly, etc.) and $M$ for the number of stocks in our portfolio, is required by Matlab as an input. We will see how does it work in next post. For now let’s solely focus on creation of this matrix.

In the article Create a Portfolio of Stocks based on Google Finance Data fed by Quandl I discussed Quandl.com as an attractive data provider for US stocks. Here, we will follow this solution making use of Quandl resources to pull out the stock price series for our shopping list. Ultimately, we aim at building a function, here: QuandlForPortfolio, that does the job for us:

% Pre-Processing of Asset Price Series for Portfolio Optimization in Matlab % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz   clear all; close all; clc;   % Input Parameters n=1*365; tickers='ShoppingList.lst'; qcodes='QuandlStockCodeListUS.xlsx';   [X,Y,R,AssetList] = QuandlForPortfolio(n,tickers,qcodes);

We call this function with three input parameters. The first one, $n$, denotes a number of calendar days from today (counting backwards) for which we wish to retrieve the stock data. Usually, 365 days will correspond to about 250$-$252 trading days. The second parameter is a path/file name to our list of stock (desired to be taken into account in the portfolio optimisation process) while the last input defines the path/file name to the file storing stocks’ tickers and associated Quandl Price Codes (see here for more details).

Feeding the Beast

The QuandlForPortfolio Matlab function is an extended version of the previously discussed solution. It contains an important correcting procedure for the data fetched from the Quandl servers. First, let’s have a closer look on the function itself:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 % Function assists in fetching Google Finance data from the Quandl.com % server for a given list of tickers of stocks traded on NYSE or % NASDAQ. Data are retrieved for last 'n' days with daily sampling. % % INPUT % n : number of calendar days from 'today' (e.g. 365 would % correspond to about 252 business days) % tickers : a path/file name of a text file listing tickers % qcodes : a path/file name of Excel workbook (.xlsx) containing a list % of tickers and Quandl Price Codes in the format of % [Ticker,Stock Name,Price Code,Ratios Code,In Market?] % OUTPUT % X0 : [Nx1] column vector with days % Y0 : [NxM] matrix with Close Prices for M stocks % R0 : [(N-1)xM] matrix of Retruns % AssetList : a list of tickers (cell array) % % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz   function [X0,Y0,R0,AssetList0] = QuandlForPortfolio(n,tickers,qcodes) fileID = fopen(tickers); tmp = textscan(fileID, '%s'); fclose(fileID); AssetList=tmp{1}; % a list as a cell array   % Read in the list of tickers and internal Quandl codes % [~,text,~] = xlsread(qcodes); quandlc=text(:,1); % again, as a list in a cell array quandlcode=text(:,3); % corresponding Quandl's Price Code   date1=datestr(today-n,'yyyy-mm-dd'); % from date2=datestr(today,'yyyy-mm-dd'); % to   % Fetch the data from Quandl.com % QData={}; for i=1:length(AssetList) for j=1:length(quandlc) if(strcmp(AssetList{i},quandlc{j})) fprintf('%4.0f %s\n',i,quandlc{j}); fts=0; [fts,headers]=Quandl.get(quandlcode{j},'type','fints', ... 'authcode','x',... 'start_date',date1,'end_date',date2,'collapse','daily'); QData{i}=fts; end end end   % Post-Processing of Fetched Data % % create a list of days across all tickers TMP=[]; for i=1:length(QData) tmp=fts2mat(QData{i},1); tmp=tmp(:,1); TMP=[TMP; tmp]; end ut=unique(TMP); % use that list to find these days that are not present % among all data sets TMP=[]; for i=1:length(QData) tmp=fts2mat(QData{i},1); tmp=tmp(:,1); TMP=[TMP; setdiff(ut,tmp)]; end ut=unique(TMP); % finally, extract Close Prices from FTS object and store them % in Y0 matrix, plus corresponding days in X0 X0=[]; Y0=[]; for i=1:length(QData) tmp=fts2mat(QData{i},1); cp=[]; for j=1:size(tmp,1) [r,~,~]=find(ut==tmp(j,1)); if(isempty(r)) cp=[cp; tmp(j,5)]; % column 5 corresponds to Close Price if(i<2) % create a time column vector listing days % common among all data sets X0=[X0; tmp(j,1)]; end end end Y0=[Y0 cp]; end % transform Close Prices into Returns, R(i)=cp(i)/cp(i-1)-1 R0=tick2ret(Y0); AssetList0=AssetList'; end

The main bottleneck comes from the fact that Matlab’s portfolio object demands an equal number of historical returns ($N-1$) in the matrix of $R$ for all $M$ assets. We design the function in the way that it sets the common timeframe for all stocks listed on our shopping list. Of course, we ensure that all stocks were traded in the markets for about $n$ last days (rough estimation).

Now, the timeframe of $n$ last days should be understood as a first approximation. We fetch the data from Quandl (numeric date, Open, High, Low, Close, Volume) and save them in the cell array QData (lines #37-49) for each stock separately as FTS objects (Financial Time-Series objects; see Financial Toolbox). However, it may occur that not every stock we fetched displays the same amount of data. That is why we need to investigate for what days and for what stocks we miss the data. We achieve that by scanning each FTS object and creating a unique list of all days for which we have data (lines #54-60).

Next, we loop again over the same data sets but now we compare that list with a list of all dates for each stock individually (lines #63-69), capturing (line #67) those dates that are missing. Their complete list is stored as a vector in line #69. Eventually, given that, we are able to compile the full data set (e.g. Close Prices; here line #80) for all stocks in our portfolio ensuring that we will include only those dates for which we have prices across all $M$ assets (lines #70-91).

Beast Unleashed

We test our data pre-processing simply by running the block of code listed above engaging QuandlForPortfolio function and we check the results in the Matlab’s command window as follows:

>> whos X Y R AssetList Name Size Bytes Class Attributes   AssetList 1x10 1192 cell R 250x10 20000 double X 251x1 2008 double Y 251x10 20080 double

what confirms the correctness of dimensions as expected.

At this stage, the aforementioned function can be used two-fold. First, we are interested in the portfolio optimisation and we look back at last $n$ calendar days since the most current one (today). The second usage is handy too. We consider our stocks on the shopping list and fetch for their last, say, $n=7\times365$ days with data. If all stocks were traded over past 7 years we should be able to collect a reach data set. If not, the function will adjust the beginning and end date to meet the initial time constrains as required for $R$ matrix construction. For the former case, we can use 7-year data sample for direct backtesting of algo models utilizing Portfolio Optimization.

Stay tuned as we will rock this land in the next post!

Any Questions?

Share them across QuantCove.com – the official Forum of QuantAtRisk.

• barbour uk

I constantly spent my half an hour to read this weblog’s articles every day along with a cup of coffee.

• John

Really great post! I was struggling with Matlab portfolio optimisation and you simply attacked the problem from the most practical side. Thanks!

• Chris3

The way how you explain the problems is stunning. With pleasure I follow your posts. Good job man!