Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Quantitative Risk Assessment Model for Investment Options

Working in the superannuation industry in Australia has some great advantages. A nice atmosphere at work, gym sessions with colleagues during lunch time, endless talks about girls after hours. However, as Brian Tracy once said: When you go to work, you work. And this is true. And this is rewarding.

Superannuation is the Australian way of making people rich when they retire. Sadly, barely understood by many, it offers a wide palette of investment options for your super (9-12% of every salary is credited to the super account by the employer; regulated by law). Over 120 super funds across Australia, over 7000 options to choose from. All styles of investments, from cash options, through growth and diversified fixed interest, ending among high growth-high risk options. Trust me, the funds’ portfolio managers do their best to beat the benchmarks, the markets, and competing super funds.

The industry establishes standards. Regulations. Smarter and unified ways of reporting. On 29 June 2010, Australian Prudential Regulation Authority (APRA) issued a letter to superannuation trustees advising that APRA would be providing guidance on the disclosure of superannuation investment risk to fund members. The letter to trustees states that risk should be measured as the likely number of negative annual returns over any 20 year period. And this is where our story begins.

Risk Assessment

A working practice of super funds across Australia revealed that since 2013 the way of reporting risk (following the abovementioned definition) still leaves a lot behind the curtain. It’s not clear how the assessment of an individual investment option’s risk has been derived.

APRA’s guidance states that this classification system should help members ‘readily distinguish the characteristics of each investment strategy’. To achieve this, there needs to be sufficient categories to meaningfully differentiate between various options based on risk.

The funds dare to report the risk score/label according to a seven level classification system to provide sufficient granularity. The Joint ASFA/FSC Working Group’s analysis supports the claim that the number of annual negative periods over any 20 year period is likely to fall in the range of 0 to 7 for the majority of investment options:

Risk Band Risk Label Estimated number of negative annual returns over any 20 year period
1 Very Low Less than 0.5
2 Low 0.5 to less than 1
3 Low to Medium 1 to less than 2
4 Medium 2 to less than 3
5 Medium to High 3 to less than 4
6 High 4 to less than 6
7 Very High 6 or Greater


Having an access to any investment option performance over past years does not solve fully the problem of their risk assessment following the guidance. Firstly, not every option’s life time is long enough to derive meaningful results. Maximal data span on only few occasions reaches 10 years. Secondly, the data have gaps, therefore not for every option we can fetch its monthly performance figures. Thirdly, an access to data requires huge amount of labour of dedicated people who put all grains of sand into one jar and then are willing to sell it (e.g. the research house of SuperRatings in Sydney).

Lastly, knowing the option’s performance over past 3 years the questions arise. How should we estimate its risk score correctly? How many times in the upcoming 20 year period a considered investment option is going to denote negative annual returns? We need a model.

Model

A quantitative idea standing behind a unification of APRA’s risk measure guidance for investment options can find solution in the fundamentals of statistics. Below we will design a way of guessing the answer for the problem we question.

If we recall the concept of the Binomial Distribution, we immediately recognise its potential application. Consider a time-series of monthly returns, $\{ r_i \}$, of the total length of $N$ months. If $N$ is larger than 12, we can calculate
$$
m=(N-12)+1
$$ times the annual return,
$$
R_j = \left[ \prod_{i}^{12} (1+r_i) \right] – 1 \ \ \mbox{for} \ j = 1,…,m
$$ where $r_i$ are given as decimals and $j$ denotes specific period of 12 consecutive months. $R_j$ should be understood as a rolling annual return given the statistically justified minimum requirement of $N\ge 31$ months of data for a specific option. For example, if we have 5 years of data (60 months) then we are able to get $m=(60-12)+1=49$ test annual returns based on the uninterrupted data sequences. Here, the word test is crucial.

We build our risk score model in the following way. For any option of data length of $N$ months we construct a vector of $m$ annual returns $R$. We count the total number of negative values and denote it as $k$ out of $m$ trials. The probability of a single year to close up with the negative annual return is given as:
$$
p=\frac{k}{m}
$$ where $p$ can be assigned as the probability of success (in obtaining the negative annual return). Under the assumption of independent trials, one can find the probability of obtaining exactly $k$ successes out of, in general, $n$ trials making use of Binomial Distribution that is given by the probability mass function:
$$
Pr⁡(X=k) = {{n}\choose{k}} p^k (1-p)^{n-k}
$$ where obviously
$$
{{n}\choose{k}} = \frac{n!}{k!(n-k)!} \ \ .
$$ From the data analysis of the option performance, we aim at calculating the probability of $k$ negative annual returns in any $n=20$ trials, namely:
$$
Pr⁡(X=k) = {{20}\choose{k}} p^k (1-p)^{20-k}
$$

The model can be easily coded in Excel/VBA as a macro. The core algorithm of assessing option’s risk score you can grasp below:

If m > 0 Then
    'probability of success
    Dim p As Double
    p = k / m
    'probability for r=1..7
    Dim pr As Double, trials As Integer, r As Integer, newton As Double
    Dim ans As Integer, v0 As Double, v1 As Double
    v0 = -1
    v1 = -1
    trials = 20
    For r = 1 To 7
        newton = Application.Fact(trials) / _
                (Application.Fact(r) * Application.Fact(trials - r))
        pr = newton * Application.Power(p, r) * Application.Power(1 - p, trials - r)
        If pr > v0 Then
            v0 = pr
            v1 = r
        End If
    Next r
    'a new risk score
    If v1 = 1 Then
        scoreSR = 1 ' "Very Low"
    Else
        If v1 = 2 Then
            scoreSR = 2 ' "Low"
        Else
            If v1 = 3 Then
                scoreSR = 3 ' "Low to Medium"
            Else
                If v1 = 4 Then
                    scoreSR = 4 ' "Medium"
                Else
                    If v1 = 5 Then
                        scoreSR = 5 ' "Medium to High"
                    Else
                        If v1 = 6 Then
                            scoreSR = 6 ' "High"
                        Else
                            If v1 = 7 Then
                                scoreSR = 7 ' "Very High"
                            Else
                                scoreSR = ""
                            End If
                        End If
                    End If
                End If
            End If
        End If
    End If

The resulting solution may lead us to the reformulation of the risk assessment for individual investment options as presented in the following table:

Risk Band New Risk Label Estimated number of negative annual returns over any 20 year period ($k$)
1 Very Low 0 to 1
2 Low 2
3 Low to Medium 3
4 Medium 4
5 Medium to High 5
6 High 6
7 Very High 7 or Greater

Intuitively, any Cash Option which always denotes positive monthly returns is also ranked (by our model) as an option of a Very Low risk. You can apply this model for any monthly return series since we play we the numbers. This is what quants do best. Number crunching!

Nasty Assumption

The motivation behind the model sleeps in one word you can find in the definition we underlined at the very beginning: Estimated number of negative annual returns over any 20 year period. ‘Any’ allows us to choose any random sequence of 12 consecutive monthly returns and that pays off in the larger number of trials under the assumption of independence.

Na, right. The nasty piece of the model: the independence of individual trials. This is a cornerstone as it comes to Binomial Distribution. As a smart quant you can question its validity within our model. I tackled with this problem for a while and I came to an empirical lemma supporting our line of defence.

The independence of individual trials can be justified as an analogous to the following experiment. Imagine that a sheet of A4-format paper is available. The paper has a colour gradually changing from left to right from black to light grey, and the colour gradient is constant in a vertical direction. We cut the sheet horizontally into a finite number of strips of paper of 1 cm height, however starting every new strip by 1 cm to the right (the concept of a sliding window in time-serie analysis). Any comparison of two adjacent strips of paper indicates a finite common area of the same colour as defined by the gradient: correlation. We put all strips into one jar and churn it. Next we randomly pull strips out of the jar. Each strip is an independent trial out of all strips placed inside the jar as the memory of correlation between two formally adjacent strips has been lost.

Invest smartly and don’t worry. Correlations are everywhere.

Get the Data of Fund Performance directly into Excel utilizing VBA and Google


Let’s leave the fascinating world of algebra and Matlab for a while and let’s see what exciting we can do from the Excel level. Microsoft Excel has this ‘thanks God’ programming toolkit known as VBA (Visual Basic for Application) that adds a soul of coding into a mortal body of spreadsheets. If you are an advanced programmer and you are asked or forced to deal with solving problems in Excel/VBA, no doubt you will find programming in VBA as a return to the kindergarten. Saying shortly, VBA is all about fun and relax at work.

Great. As I’m working over a special section of QuantAtRisk.com devoted to From Zero to Hero bootcamp for those of You who wish to brush up or master own skill in VBA and Matlab programming in the application to the quantitative finance and risk management, today I want you to jump with me into the open ocean of VBA application for a specific task that I found very helpful at some stage of my work with the data. Therefore, let’s define the problem first.

The problem is about our quick ability to download some data from the webpage of our choice (e.g. regarding a recent fund or investment option performance) directly into Excel, and next, to look for and extract exactly what we would like to save for our further quantitative analysis.

Since VBA codes for solving this problem are easily spread around the Web, it wouldn’t be challenging to dedicate another article treating the same problem in a new frame. And it’s not my style. Because I love to challenge myself in my private life, I even more love to challenge the problem I wish to solve. I don’t need to look too far as the problems I find around challenge me equally.

1. Case Study

We hunt for the data that are displayed by the data provider within a web-frame, and all VBA codes aimed at getting access to that frame simply fail. Let’s consider the example of InvestSMART.com.au webpage. It’s an Australian portal feeding for some funds its data from the Morningstar.com.au data provider. We are looking for the latest investment option performance of, say, OnePath OA PS – OP Australian Shares EF of the APIR code MMF0022AU:

Screen Shot 2013-10-21 at 11.15.55 PM

and we want to download into Excel the Total Return numbers for 1 Month, 3 Month, and 1 Year performance. In case when the standard ways of getting this information do not work, for example, utilizing the classical variables of doc.body.innerText or doc.body.outerText as in the epic approach:

Dim iFrm As HTMLIFrame
Dim doc As MSHTML.HTMLDocument
 For iterator = 0 To appIE.Document.all.Length - 1
  If TypeName(appIE.Document.all(iterator)) = "HTMLIFrame" Then
   Set iFrm = appIE.Document.all(iterator)
   Set doc = iFrm.Document
   MsgBox doc.body.outerText
  End If
Next

we need to get another way round this problem. One of them is the application of Google Search for the fund performance based on the investment option code.

2. Limit Google Search down to the first listing, download the webpage directly into Excel

That title says everything. We can use Google.com to perform the search for the same investment option among all webpages of InvestSMART as indexed by Google up to date. This solution seems to be more elegant and getting straight to the point. To make this post worth substantially more, imagine, that your task is to repeat the data acquisition process for a long list of options you have in your Excel worksheet:

Screen Shot 2013-10-21 at 11.48.54 PM

We not only look for the way of fetching the required performance numbers but also to automate the whole process to work in the background (or overnight). Given the active Excel worksheet constructed as displayed above, we write VBA code (a macro) making use of Google Search engine. We read in the APIR codes one by one from column B and allow VBA to search the Google by the keyword constructed, in general, as: “InvestSMART”+APIRCode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
'Get Fund Performance directly into Excel utilizing VBA and Google
'
'(c) 2013 QuantAtRisk.com, by Pawel Lachowicz

Sub CheckAPIRCodes()
 
WB = ActiveWorkbook.Name
WS = ActiveSheet.Name
 
'Declaration of Variables
Dim i, j, k, r As Long
Dim cellstr, cellstr2, cellstr3, cellstr4, str1, str2, extractedHTML As String
Dim NameB, NameS, MyStr, GoogleSearchPath, SearchString As String
Dim d1, d2, d3, d4, d5, d6, APIRCode As Variant
Dim Nstr, Mstr, c1, c2, c3, c4, c5, c6 As Long
Dim SearchingError As Boolean
Dim iStart, iEnd As Integer
Dim ie As InternetExplorer
Dim RegEx As RegExp, RegMatch As MatchCollection
Dim iedoc, pDisp As Object
 
SearchingError = False
Set ie = New InternetExplorer
Set RegEx = New RegExp
 
 
'Find number of all rows with codes in WS (including header)
Dim Nrow As Long
Nrow = ActiveSheet.Range("B" & Rows.Count).End(xlUp).Row
 
 
'--Main Loop over the list of all APIR codes as specified in column B

For r = 1 To Nrow
    Application.Wait (Now + TimeValue("0:00:05"))
    SearchingError = False
    'Read APIR Code from the list (row number = r)
    Workbooks(WB).Worksheets(WS).Activate
    APIRCode = Workbooks(WB).Worksheets(WS).Cells(r, 2).Value
 
    '--GOOGLE Search for APIR Code in InvestSMART
    Set ie = New InternetExplorer
    Set RegEx = New RegExp
    'Search in Google for 'SearchString' keyword
    GoogleSearchPath = "https://www.google.com.au/search?q="
    'The keyword
    SearchString = "InvestSMART+" & APIRCode
    ie.Navigate GoogleSearchPath & SearchString
    'Loop until IE page is full loaded
    Do Until ie.ReadyState = READYSTATE_COMPLETE
    Loop
    'Set regular expression
    MyStr = ""
    MyStr = ie.Document.body.innerText
    Set RegMatch = Nothing
    Set RegMatch = RegEx.Execute(MyStr)
    If RegMatch.Count > 0 Then
        ie.Navigate RegMatch(0)
        Do Until ie.ReadyState = READYSTATE_COMPLETE
        Loop
            'show Internet Explorer
            ie.Visible = True
            Set iedoc = ie.Document
            'Extract the first webpage from the Google search; if failed,
            'handle the error
            extractedHTML = ""
            On Error GoTo ErrHandler:
            extractedHTML = iedoc.getElementById("search").innerHTML
ErrHandler:
            ie.Quit
            Mstr = 0
            Mstr = Len(extractedHTML)
            Resume GoOn:
GoOn:
            If (Mstr = 0) Then
                SearchingError = True
            Else
                SearchingError = False
            End If
            If (SearchingError = False) Then
                'find the first href as this will be the first link, 
                'add 1 to encompass the quote
                iStart = InStr(1, extractedHTML, "href=", vbTextCompare) 
                         + Len("href=") + 1
                'locate the next quote as this will be the end of the href
                iEnd = InStr(iStart, extractedHTML, Chr(34), vbTextCompare)
                'extract text
                extractedHTML = Mid(extractedHTML, iStart, iEnd - iStart)
                Nstr = Len(extractedHTML)
                iStart = 0
                iStart = InStr(1, extractedHTML, "&", vbTextCompare)
                If (iStart > 0) Then
                    str1 = Left(extractedHTML, iStart)
                    str2 = Right(extractedHTML, Nstr - iStart - 4)
                    extractedHTML = str1 & str2
                End If
            End If
            str1 = Left(extractedHTML, 29)
            If (str1 <> "http://www.investsmart.com.au") Then
                SearchingError = True
            End If
            Else
                 'have the best sex ever
        End If

In line #29 the code counts the number of elements listed in column B. This solution works only for the lists having no blank cells between the elements.

The practice with the code also reveals that Google protects itself from the robotic-like requests sent from our VBA level if the time of sending a search request to Google is spaced by a few seconds only. Therefore, the smart step to avoid this obstacle is to add a time delay of about 5 seconds (see line #35). The code works perfectly under Windows environment as it uses Internet Explorer, so please don’t be misled by a screenshot taken from Apple’s MacOS. To the best of my knowledge, there are problems in running this code in Apple due to the lack of right socket replacing IE.

The last step is natural. If Google found and limited its first search entry to the webpage of InvestSMART and requested APIR code, we fetch and download the page directly into Excel. The following solution (among many many available on-line) I found to work the best:

103
104
105
106
107
108
109
'---- If InvestSMART with APIR code has been found by Google, 
'extract performance numbers
    If (SearchingError = False) Then
        'Download extractedHTML into Excel (new workbook)
        Application.Workbooks.Open (extractedHTML)
        NameB = ActiveWorkbook.Name
        NameS = ActiveSheet.Name

where extractedHTML variable stores the exact web address of the page with the data we are interested in. Check this code separately how does it work with different domains. A little bit of reconnaissance here will help you to discover how this process is powerful for Excel. In short, the extractedHTML webpage is opened in Excel as a new worksheet!

Given that unique experience, we are left with a basic data manipulation based on the data extracted from the Internet:

110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
       For i = 1 To 50
            Workbooks(NameB).Worksheets(NameS).Activate
            cellstr = Workbooks(NameB).Worksheets(NameS).Cells(i, 1).Value
            cellstr2 = Left(cellstr, 16)
            If (cellstr2 = "Fund Performance") Then
                cellstr4 = cellstr
                For j = 1 To 20
                    Workbooks(NameB).Worksheets(NameS).Activate
                    cellstr = Workbooks(NameB).Worksheets(NameS).Cells(i+1,j).Value
                    If (cellstr = "1 Month") Then c1 = j
                    If (cellstr = "3 Month") Then c2 = j
                    If (cellstr = "1 Year") Then c3 = j
                Next j
                For j = i + 1 To i + 5
                    cellstr = Workbooks(NameB).Worksheets(NameS).Cells(j, 1).Value
                    cellstr2 = Left(cellstr, 12)
                    If (cellstr2 = "Total Return") Then
                        k = j
                        'read data from InvestSMART
                        '1M
                        d1 = Workbooks(NameB).Worksheets(NameS).Cells(k, c1).Value
                        '3M (Q)
                        d2 = Workbooks(NameB).Worksheets(NameS).Cells(k, c2).Value
                        '1Y
                        d3 = Workbooks(NameB).Worksheets(NameS).Cells(k, c3).Value
                        'save in a working sheet
                        Workbooks(WB).Worksheets(WS).Activate
                        Workbooks(WB).Worksheets(WS).Cells(r, 3).Value = d1
                        Workbooks(WB).Worksheets(WS).Cells(r, 4).Value = d2
                        Workbooks(WB).Worksheets(WS).Cells(r, 5).Value = d3
                        Workbooks(WB).Worksheets(WS).Cells(r, 6).Value = cellstr4
                        Workbooks(WB).Save
                    End If
                Next j
            End If
        Next i
        'Workbooks(NameB).Close
    End If
    'ie.Quit
    Set RegEx = Nothing
    Set ie = Nothing
Next r
 
Workbooks(WB).Worksheets(WS).Activate
Range("A1").Select
 
End Sub

what accomplishes our mutual VBA-Google efforts into the final result of a “Master and his dog” game of throwing and fetching the ball:

Screen Shot 2013-10-22 at 12.25.44 AM

Contact Form Powered By : XYZScripts.com