Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

## Quantitative Risk Assessment Model for Investment Options

Working in the superannuation industry in Australia has some great advantages. A nice atmosphere at work, gym sessions with colleagues during lunch time, endless talks about girls after hours. However, as Brian Tracy once said: When you go to work, you work. And this is true. And this is rewarding.

Superannuation is the Australian way of making people rich when they retire. Sadly, barely understood by many, it offers a wide palette of investment options for your super (9-12% of every salary is credited to the super account by the employer; regulated by law). Over 120 super funds across Australia, over 7000 options to choose from. All styles of investments, from cash options, through growth and diversified fixed interest, ending among high growth-high risk options. Trust me, the funds’ portfolio managers do their best to beat the benchmarks, the markets, and competing super funds.

The industry establishes standards. Regulations. Smarter and unified ways of reporting. On 29 June 2010, Australian Prudential Regulation Authority (APRA) issued a letter to superannuation trustees advising that APRA would be providing guidance on the disclosure of superannuation investment risk to fund members. The letter to trustees states that risk should be measured as the likely number of negative annual returns over any 20 year period. And this is where our story begins.

Risk Assessment

A working practice of super funds across Australia revealed that since 2013 the way of reporting risk (following the abovementioned definition) still leaves a lot behind the curtain. It’s not clear how the assessment of an individual investment option’s risk has been derived.

APRA’s guidance states that this classification system should help members ‘readily distinguish the characteristics of each investment strategy’. To achieve this, there needs to be sufficient categories to meaningfully differentiate between various options based on risk.

The funds dare to report the risk score/label according to a seven level classification system to provide sufficient granularity. The Joint ASFA/FSC Working Group’s analysis supports the claim that the number of annual negative periods over any 20 year period is likely to fall in the range of 0 to 7 for the majority of investment options:

Risk Band Risk Label Estimated number of negative annual returns over any 20 year period
1 Very Low Less than 0.5
2 Low 0.5 to less than 1
3 Low to Medium 1 to less than 2
4 Medium 2 to less than 3
5 Medium to High 3 to less than 4
6 High 4 to less than 6
7 Very High 6 or Greater

Having an access to any investment option performance over past years does not solve fully the problem of their risk assessment following the guidance. Firstly, not every option’s life time is long enough to derive meaningful results. Maximal data span on only few occasions reaches 10 years. Secondly, the data have gaps, therefore not for every option we can fetch its monthly performance figures. Thirdly, an access to data requires huge amount of labour of dedicated people who put all grains of sand into one jar and then are willing to sell it (e.g. the research house of SuperRatings in Sydney).

Lastly, knowing the option’s performance over past 3 years the questions arise. How should we estimate its risk score correctly? How many times in the upcoming 20 year period a considered investment option is going to denote negative annual returns? We need a model.

Model

A quantitative idea standing behind a unification of APRA’s risk measure guidance for investment options can find solution in the fundamentals of statistics. Below we will design a way of guessing the answer for the problem we question.

If we recall the concept of the Binomial Distribution, we immediately recognise its potential application. Consider a time-series of monthly returns, $\{ r_i \}$, of the total length of $N$ months. If $N$ is larger than 12, we can calculate
$$m=(N-12)+1$$ times the annual return,
$$R_j = \left[ \prod_{i}^{12} (1+r_i) \right] – 1 \ \ \mbox{for} \ j = 1,…,m$$ where $r_i$ are given as decimals and $j$ denotes specific period of 12 consecutive months. $R_j$ should be understood as a rolling annual return given the statistically justified minimum requirement of $N\ge 31$ months of data for a specific option. For example, if we have 5 years of data (60 months) then we are able to get $m=(60-12)+1=49$ test annual returns based on the uninterrupted data sequences. Here, the word test is crucial.

We build our risk score model in the following way. For any option of data length of $N$ months we construct a vector of $m$ annual returns $R$. We count the total number of negative values and denote it as $k$ out of $m$ trials. The probability of a single year to close up with the negative annual return is given as:
$$p=\frac{k}{m}$$ where $p$ can be assigned as the probability of success (in obtaining the negative annual return). Under the assumption of independent trials, one can find the probability of obtaining exactly $k$ successes out of, in general, $n$ trials making use of Binomial Distribution that is given by the probability mass function:
$$Pr⁡(X=k) = {{n}\choose{k}} p^k (1-p)^{n-k}$$ where obviously
$${{n}\choose{k}} = \frac{n!}{k!(n-k)!} \ \ .$$ From the data analysis of the option performance, we aim at calculating the probability of $k$ negative annual returns in any $n=20$ trials, namely:
$$Pr⁡(X=k) = {{20}\choose{k}} p^k (1-p)^{20-k}$$

The model can be easily coded in Excel/VBA as a macro. The core algorithm of assessing option’s risk score you can grasp below:

If m > 0 Then 'probability of success Dim p As Double p = k / m 'probability for r=1..7 Dim pr As Double, trials As Integer, r As Integer, newton As Double Dim ans As Integer, v0 As Double, v1 As Double v0 = -1 v1 = -1 trials = 20 For r = 1 To 7 newton = Application.Fact(trials) / _ (Application.Fact(r) * Application.Fact(trials - r)) pr = newton * Application.Power(p, r) * Application.Power(1 - p, trials - r) If pr > v0 Then v0 = pr v1 = r End If Next r 'a new risk score If v1 = 1 Then scoreSR = 1 ' "Very Low" Else If v1 = 2 Then scoreSR = 2 ' "Low" Else If v1 = 3 Then scoreSR = 3 ' "Low to Medium" Else If v1 = 4 Then scoreSR = 4 ' "Medium" Else If v1 = 5 Then scoreSR = 5 ' "Medium to High" Else If v1 = 6 Then scoreSR = 6 ' "High" Else If v1 = 7 Then scoreSR = 7 ' "Very High" Else scoreSR = "" End If End If End If End If End If End If End If

The resulting solution may lead us to the reformulation of the risk assessment for individual investment options as presented in the following table:

Risk Band New Risk Label Estimated number of negative annual returns over any 20 year period ($k$)
1 Very Low 0 to 1
2 Low 2
3 Low to Medium 3
4 Medium 4
5 Medium to High 5
6 High 6
7 Very High 7 or Greater

Intuitively, any Cash Option which always denotes positive monthly returns is also ranked (by our model) as an option of a Very Low risk. You can apply this model for any monthly return series since we play we the numbers. This is what quants do best. Number crunching!

Nasty Assumption

The motivation behind the model sleeps in one word you can find in the definition we underlined at the very beginning: Estimated number of negative annual returns over any 20 year period. ‘Any’ allows us to choose any random sequence of 12 consecutive monthly returns and that pays off in the larger number of trials under the assumption of independence.

Na, right. The nasty piece of the model: the independence of individual trials. This is a cornerstone as it comes to Binomial Distribution. As a smart quant you can question its validity within our model. I tackled with this problem for a while and I came to an empirical lemma supporting our line of defence.

The independence of individual trials can be justified as an analogous to the following experiment. Imagine that a sheet of A4-format paper is available. The paper has a colour gradually changing from left to right from black to light grey, and the colour gradient is constant in a vertical direction. We cut the sheet horizontally into a finite number of strips of paper of 1 cm height, however starting every new strip by 1 cm to the right (the concept of a sliding window in time-serie analysis). Any comparison of two adjacent strips of paper indicates a finite common area of the same colour as defined by the gradient: correlation. We put all strips into one jar and churn it. Next we randomly pull strips out of the jar. Each strip is an independent trial out of all strips placed inside the jar as the memory of correlation between two formally adjacent strips has been lost.

Invest smartly and don’t worry. Correlations are everywhere.

## Get the Data of Fund Performance directly into Excel utilizing VBA and Google

Let’s leave the fascinating world of algebra and Matlab for a while and let’s see what exciting we can do from the Excel level. Microsoft Excel has this ‘thanks God’ programming toolkit known as VBA (Visual Basic for Application) that adds a soul of coding into a mortal body of spreadsheets. If you are an advanced programmer and you are asked or forced to deal with solving problems in Excel/VBA, no doubt you will find programming in VBA as a return to the kindergarten. Saying shortly, VBA is all about fun and relax at work.

Great. As I’m working over a special section of QuantAtRisk.com devoted to From Zero to Hero bootcamp for those of You who wish to brush up or master own skill in VBA and Matlab programming in the application to the quantitative finance and risk management, today I want you to jump with me into the open ocean of VBA application for a specific task that I found very helpful at some stage of my work with the data. Therefore, let’s define the problem first.

The problem is about our quick ability to download some data from the webpage of our choice (e.g. regarding a recent fund or investment option performance) directly into Excel, and next, to look for and extract exactly what we would like to save for our further quantitative analysis.

Since VBA codes for solving this problem are easily spread around the Web, it wouldn’t be challenging to dedicate another article treating the same problem in a new frame. And it’s not my style. Because I love to challenge myself in my private life, I even more love to challenge the problem I wish to solve. I don’t need to look too far as the problems I find around challenge me equally.

1. Case Study

We hunt for the data that are displayed by the data provider within a web-frame, and all VBA codes aimed at getting access to that frame simply fail. Let’s consider the example of InvestSMART.com.au webpage. It’s an Australian portal feeding for some funds its data from the Morningstar.com.au data provider. We are looking for the latest investment option performance of, say, OnePath OA PS – OP Australian Shares EF of the APIR code MMF0022AU:

and we want to download into Excel the Total Return numbers for 1 Month, 3 Month, and 1 Year performance. In case when the standard ways of getting this information do not work, for example, utilizing the classical variables of doc.body.innerText or doc.body.outerText as in the epic approach:

Dim iFrm As HTMLIFrame Dim doc As MSHTML.HTMLDocument For iterator = 0 To appIE.Document.all.Length - 1 If TypeName(appIE.Document.all(iterator)) = "HTMLIFrame" Then Set iFrm = appIE.Document.all(iterator) Set doc = iFrm.Document MsgBox doc.body.outerText End If Next

we need to get another way round this problem. One of them is the application of Google Search for the fund performance based on the investment option code.

2. Limit Google Search down to the first listing, download the webpage directly into Excel

That title says everything. We can use Google.com to perform the search for the same investment option among all webpages of InvestSMART as indexed by Google up to date. This solution seems to be more elegant and getting straight to the point. To make this post worth substantially more, imagine, that your task is to repeat the data acquisition process for a long list of options you have in your Excel worksheet:

We not only look for the way of fetching the required performance numbers but also to automate the whole process to work in the background (or overnight). Given the active Excel worksheet constructed as displayed above, we write VBA code (a macro) making use of Google Search engine. We read in the APIR codes one by one from column B and allow VBA to search the Google by the keyword constructed, in general, as: “InvestSMART”+APIRCode:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 'Get Fund Performance directly into Excel utilizing VBA and Google ' '(c) 2013 QuantAtRisk.com, by Pawel Lachowicz Sub CheckAPIRCodes()   WB = ActiveWorkbook.Name WS = ActiveSheet.Name   'Declaration of Variables Dim i, j, k, r As Long Dim cellstr, cellstr2, cellstr3, cellstr4, str1, str2, extractedHTML As String Dim NameB, NameS, MyStr, GoogleSearchPath, SearchString As String Dim d1, d2, d3, d4, d5, d6, APIRCode As Variant Dim Nstr, Mstr, c1, c2, c3, c4, c5, c6 As Long Dim SearchingError As Boolean Dim iStart, iEnd As Integer Dim ie As InternetExplorer Dim RegEx As RegExp, RegMatch As MatchCollection Dim iedoc, pDisp As Object   SearchingError = False Set ie = New InternetExplorer Set RegEx = New RegExp     'Find number of all rows with codes in WS (including header) Dim Nrow As Long Nrow = ActiveSheet.Range("B" & Rows.Count).End(xlUp).Row     '--Main Loop over the list of all APIR codes as specified in column B For r = 1 To Nrow Application.Wait (Now + TimeValue("0:00:05")) SearchingError = False 'Read APIR Code from the list (row number = r) Workbooks(WB).Worksheets(WS).Activate APIRCode = Workbooks(WB).Worksheets(WS).Cells(r, 2).Value   '--GOOGLE Search for APIR Code in InvestSMART Set ie = New InternetExplorer Set RegEx = New RegExp 'Search in Google for 'SearchString' keyword GoogleSearchPath = "https://www.google.com.au/search?q=" 'The keyword SearchString = "InvestSMART+" & APIRCode ie.Navigate GoogleSearchPath & SearchString 'Loop until IE page is full loaded Do Until ie.ReadyState = READYSTATE_COMPLETE Loop 'Set regular expression MyStr = "" MyStr = ie.Document.body.innerText Set RegMatch = Nothing Set RegMatch = RegEx.Execute(MyStr) If RegMatch.Count > 0 Then ie.Navigate RegMatch(0) Do Until ie.ReadyState = READYSTATE_COMPLETE Loop 'show Internet Explorer ie.Visible = True Set iedoc = ie.Document 'Extract the first webpage from the Google search; if failed, 'handle the error extractedHTML = "" On Error GoTo ErrHandler: extractedHTML = iedoc.getElementById("search").innerHTML ErrHandler: ie.Quit Mstr = 0 Mstr = Len(extractedHTML) Resume GoOn: GoOn: If (Mstr = 0) Then SearchingError = True Else SearchingError = False End If If (SearchingError = False) Then 'find the first href as this will be the first link, 'add 1 to encompass the quote iStart = InStr(1, extractedHTML, "href=", vbTextCompare) + Len("href=") + 1 'locate the next quote as this will be the end of the href iEnd = InStr(iStart, extractedHTML, Chr(34), vbTextCompare) 'extract text extractedHTML = Mid(extractedHTML, iStart, iEnd - iStart) Nstr = Len(extractedHTML) iStart = 0 iStart = InStr(1, extractedHTML, "&amp;", vbTextCompare) If (iStart > 0) Then str1 = Left(extractedHTML, iStart) str2 = Right(extractedHTML, Nstr - iStart - 4) extractedHTML = str1 & str2 End If End If str1 = Left(extractedHTML, 29) If (str1 <> "http://www.investsmart.com.au") Then SearchingError = True End If Else 'have the best sex ever End If

In line #29 the code counts the number of elements listed in column B. This solution works only for the lists having no blank cells between the elements.

The practice with the code also reveals that Google protects itself from the robotic-like requests sent from our VBA level if the time of sending a search request to Google is spaced by a few seconds only. Therefore, the smart step to avoid this obstacle is to add a time delay of about 5 seconds (see line #35). The code works perfectly under Windows environment as it uses Internet Explorer, so please don’t be misled by a screenshot taken from Apple’s MacOS. To the best of my knowledge, there are problems in running this code in Apple due to the lack of right socket replacing IE.

The last step is natural. If Google found and limited its first search entry to the webpage of InvestSMART and requested APIR code, we fetch and download the page directly into Excel. The following solution (among many many available on-line) I found to work the best:

103 104 105 106 107 108 109 '---- If InvestSMART with APIR code has been found by Google, 'extract performance numbers If (SearchingError = False) Then 'Download extractedHTML into Excel (new workbook) Application.Workbooks.Open (extractedHTML) NameB = ActiveWorkbook.Name NameS = ActiveSheet.Name

where extractedHTML variable stores the exact web address of the page with the data we are interested in. Check this code separately how does it work with different domains. A little bit of reconnaissance here will help you to discover how this process is powerful for Excel. In short, the extractedHTML webpage is opened in Excel as a new worksheet!

Given that unique experience, we are left with a basic data manipulation based on the data extracted from the Internet:

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 For i = 1 To 50 Workbooks(NameB).Worksheets(NameS).Activate cellstr = Workbooks(NameB).Worksheets(NameS).Cells(i, 1).Value cellstr2 = Left(cellstr, 16) If (cellstr2 = "Fund Performance") Then cellstr4 = cellstr For j = 1 To 20 Workbooks(NameB).Worksheets(NameS).Activate cellstr = Workbooks(NameB).Worksheets(NameS).Cells(i+1,j).Value If (cellstr = "1 Month") Then c1 = j If (cellstr = "3 Month") Then c2 = j If (cellstr = "1 Year") Then c3 = j Next j For j = i + 1 To i + 5 cellstr = Workbooks(NameB).Worksheets(NameS).Cells(j, 1).Value cellstr2 = Left(cellstr, 12) If (cellstr2 = "Total Return") Then k = j 'read data from InvestSMART '1M d1 = Workbooks(NameB).Worksheets(NameS).Cells(k, c1).Value '3M (Q) d2 = Workbooks(NameB).Worksheets(NameS).Cells(k, c2).Value '1Y d3 = Workbooks(NameB).Worksheets(NameS).Cells(k, c3).Value 'save in a working sheet Workbooks(WB).Worksheets(WS).Activate Workbooks(WB).Worksheets(WS).Cells(r, 3).Value = d1 Workbooks(WB).Worksheets(WS).Cells(r, 4).Value = d2 Workbooks(WB).Worksheets(WS).Cells(r, 5).Value = d3 Workbooks(WB).Worksheets(WS).Cells(r, 6).Value = cellstr4 Workbooks(WB).Save End If Next j End If Next i 'Workbooks(NameB).Close End If 'ie.Quit Set RegEx = Nothing Set ie = Nothing Next r   Workbooks(WB).Worksheets(WS).Activate Range("A1").Select   End Sub

what accomplishes our mutual VBA-Google efforts into the final result of a “Master and his dog” game of throwing and fetching the ball: