Quantitative Analysis, Risk Management, Modelling, Algo Trading, and Big Data Analysis

Non-Linear Cross-Bicorrelations between the Oil Prices and Stock Fundamentals

When we talk about correlations in finance, by default, we assume linear relationships between two time-series “co-moving”. In other words, if one time-series changes its values over a give time period, we seek for a tight correlation reflected within the other time-series. If found, we say they are correlated. But wait a minute! Is the game always about looking for an immediate feedback between two time-series? Is “correlated” always a synonymous of a direct, i.e. a linear response?

Think for a second about the oil prices. If they rise within a quarter, the effect on some companies is not the same. Certain firms can be tightly linked to oil/petroleum prices that affect their product value, distribution, sales, etc. while for other companies the change of crude oil price has a negligible effect, e.g. the online services. Is it so?

As a data scientist, quantitative (business) analysts, or investment researcher you need to know that mathematics and statistics deliver ample of tools you may use in order to derive the best possibilities for two (or more) financial assets that can be somehow related, correlated, or connected. The wording does not need to be precise to reflect the goal of an investigation: correlation, in general.

Correlation can be linear and non-linear. Non-linearity of correlation is somehow counterintuitive. We used to think in a linear way that is why it is so hard to readjust our thinking in a non-linear domain. The changes of the oil prices might have a non-negligble effect on the airlines, causing the air-ticket prices to rise or fall due to recalculated oil/petroleum surcharge. The effect can be not immediate. A rise of the oil prices today can be “detected” with a time delay of days, weeks, or even months, i.e. non-linearly correlated.

In this post analyse both linear and non-linear correlation methods that have been so far applied (and not) in the research over oil price tectonics across the markets. In particular, first we discuss one-factor and multiple linear regression models to turn, in consequence, toward non-linearity possible to be captured by cross-correlation and cross-bicorrelation methods. By developing an extensive but easy-to-follow Python code, we analyse a database of Quandl.com storing over 620 stocks’ financials factors (stock fundamentals). Finally, we show how both non-linear mathematical tools can be applied in order to detect non-linear correlations for different financial assets.

1. Linear Correlations

1.1. Oil Prices versus Stock Markets

Correlations between increasing/decreasing oil prices and stock markets have been a subject of investigation for a number of years. The main interest was initially in search for linear correlations between raw price/return time-series among the stocks and oil benchmarks (e.g. WPI Crude Oil). A justification standing behind such approach to data analysis seemed to be driven by a common sense: an impact of rising oil prices should affect companies dependent on petroleum-based products or services.

The usual presumption states that a decline in oil price is a good news for the economy, especially for net oil importers, e.g. USA or China. An increase in oil prices usually causes the raise of the input costs for most businesses and force consumers to spend more money on gasoline, thereby reducing the corporate earnings of other businesses. The opposite should be true too when the oil prices fall.

The oil prices are determined by the supply-and-damand for petroleum-based products. During economic expansion, prices might rise as a result of increased consumption. They might fall as a result of increased production. That pattern has been observed on many occasions so far.

A quantitative approach to measuring the strength of linear response of the stock market to oil price can be demonstrated using a plain linear regression modeling. If we agree that S&P 500 Index is a proxy for the US stock market, we should be able to capture the relationship. In the following example, we compare WPI Crude Oil prices with S&P 500 Index for 2-year period between Nov 2014 and Nov 2016. First we plot both time-series, and next we apply one-factor linear regression model,
$$
y(t) = \beta x(t) + \epsilon(t) \ \,
$$ in order to fit the data best:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
import pandas as pd
from scipy import stats
from matplotlib import pyplot as plt
from pandas_datareader import data, wb
 
%matplotlib inline
 
grey = 0.7, 0.7, 0.7
 
import warnings
warnings.filterwarnings('ignore')
 
# downloading S&P 500 Index from Yahoo! Finance
sp500 = web.DataReader("^GSPC", data_source='yahoo',
                  start='2014-11-01', end='2016-11-01')['Adj Close']
 
# WPI Crude Oil price-series
# src: https://fred.stlouisfed.org/series/DCOILWTICO/downloaddata
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d')
wpi = pd.read_csv("DCOILWTICO.csv", parse_dates=['DATE'], date_parser=dateparse)
wpi = wpi[wpi.VALUE != "."]
wpi.VALUE = wpi.VALUE.astype(float)
wpi.index = wpi.DATE
wpi.drop('DATE', axis=1, inplace=True) 
 
# combine both time-series
data = pd.concat([wpi, sp500], axis=1).dropna()  # and remove NaNs, if any
data.columns = ["WPI", "S&P500"]
print(data.head())
              WPI       S&P500
2014-11-03  78.77  2017.810059
2014-11-04  77.15  2012.099976
2014-11-05  78.71  2023.569946
2014-11-06  77.87  2031.209961
2014-11-07  78.71  2031.920044
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
plt.figure(figsize=(8,8))
# plot time-series
ax1 = plt.subplot(2,1,1)
plt.plot(data.WPI, 'k', label="WPI Crude Oil")
plt.legend(loc=3)
plt.ylabel("WPI Crude Oil [U$/barrel]")
 
# fit one-factor linear model
slope, intercept, r2_value, p_value, std_err = 
                  stats.linregress(data.WPI, data["S&P500"])
 
# best fit: model
xline = np.linspace(np.min(data.WPI), np.max(data.WPI), 100)
line = slope*xline + intercept
 
# plot scatter plot and linear model
ax2 = ax1.twinx()
plt.plot(data["S&P500"], 'm', label="S&P500")
plt.ylabel("S&P 500 Index [U$]")
plt.legend(loc="best")
plt.subplot(2,1,2)
plt.plot(data.WPI, data["S&P500"],'.', color=grey)
plt.plot(xline, line,'--')
plt.xlabel("WPI Crude Oil [U$/barrel]")
plt.ylabel("S&P 500 Index [U$]")
plt.title("R$^2$ = %.2f" % r2_value, fontsize=11)

Our Python code generates:
unknown
where we can see that WPI Crude Oil price vs. S&P 500 linear correlation is weak, with $R^2 = 0.36$ only. This is a static picture.

As one might suspect, correlation is time-dependent. Therefore, it is wiser to check a rolling linear correlation for a given data window size. Below, we assume its length to be 50 days:

59
60
61
62
63
64
rollcorr = pd.rolling_corr(wpi, sp500, 50).dropna()
 
plt.figure(figsize=(10,3))
plt.plot(rollcorr)
plt.grid()
plt.ylabel("Linear Correlation (50 day rolling window)", fontsize=8)

unknown-1
In 2008 Andrea Pescatori measured changes in the S&P 500 and oil prices in the same way as demonstrated above. He noted that variables occasionally moved in the same direction at the same time, but even then, the relationship remained weak. He concluded on no correlation at the 95% confidence level.

The lack of correlation might be attributed to factors like: wages, interest rates, industrial metal, plastic, computer technology that can offset changes in energy costs. On the other side, corporations might have become increasingly sophisticated at reading the futures markets and are able, much better, to anticipate the shift in factor prices (e.g. a company should be able to switch production processes to compensate for added fuel costs).

No matter what we analyse, either oil prices vs. market indexes or vs. individual stock trading records, linear correlation method is solely able to measure 1-to-1 response of $y$ to $x$. The lower coefficient of correlation the less valid linear model as a descriptor of true events and mutual relationships under study.

1.2. Multiple Linear Regression Model for Oil Price Changes

In February 2016 Ben S. Bernanke found that a positive correlation of stocks and oil might arise because both are responding to the underlying shift in global demand. Using a simple multiple linear regression model he took and attempt in explaining the daily changes of the oil prices and concluded that in over 90% they could be driven by changes of commodities prices (copper), US dollar (spot price), 10-yr treasury interest rate, and in over 95% by extending the model with an inclusion of daily changes in VIX:
$$
\Delta p_{oil, t} = \beta_1 \Delta p_{copp, t} + \beta_2 \Delta p_{10yr, t} + \beta_3 \Delta p_{USD, t} + \beta_4 \Delta p_{VIX, t} + \beta_0
$$ The premise is that commodity prices, long-term interest rate, and USD are likely to respond to investors’ perceptions of global and US demand for oil. Additionally, the presence of VIX is in line with the following idea: if investors retreat from commodities and/or stocks during periods of high uncertainty/risk aversion, then shocks to volatility may be another reason for the observed tendency of stocks and oil prices moving together.

The model, inspired by James Hamilton (2014) work, aims at measuring the effect of demand shifts on the oil market. For the sake of elegance, we can code it in Python as follows. First, we download to the files all necessary price-series and process them within pandas Dataframes (more on Quantitative Aspects of pandas for Quants you will find in my upcoming book Python for Quants. Volume II.).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
import pandas as pd
from scipy import stats
import datetime
from datetime import datetime
from matplotlib import pyplot as plt
import statsmodels.api as sm
 
%matplotlib inline
 
grey = 0.7, 0.7, 0.7
 
import warnings
warnings.filterwarnings('ignore')
 
# Download WPI Crude Oil prices
# src: https://fred.stlouisfed.org/series/DCOILWTICO/downloaddata
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d')
wpi = pd.read_csv("DCOILWTICO.csv", parse_dates=['DATE'], date_parser=dateparse)
wpi = wpi[wpi.VALUE != "."]
wpi.VALUE = wpi.VALUE.astype(float)
wpi.index = wpi.DATE
wpi.drop('DATE', axis=1, inplace=True)
 
# Copper Futures 
# src: http://www.investing.com/commodities/copper-historical-data
cop = pd.read_csv("COPPER.csv", index_col=["Date"])
cop.index = pd.to_datetime(pd.Series(cop.index), format='%b %d, %Y')
 
# Nominal interest rate on 10-year Treasury bonds
# src: https://fred.stlouisfed.org/series/DGS10/
tb = pd.read_csv('DGS10.csv')
tb = tb[tb.DGS10 != "."]
tb.DGS10 = tb.DGS10.astype(float)
tb['DATE'] =  pd.to_datetime(tb['DATE'], format='%Y-%m-%d')
tb.index = tb.DATE
tb.drop('DATE', axis=1, inplace=True)
 
# Trade Weighted U.S. Dollar Index
# src: https://fred.stlouisfed.org/series/DTWEXM
usd = pd.read_csv('DTWEXM.csv')
usd = usd[usd.DTWEXM != "."]
usd.DTWEXM = usd.DTWEXM.astype(float)
usd['DATE'] =  pd.to_datetime(usd['DATE'], format='%Y-%m-%d')
usd.index = usd.DATE
usd.drop('DATE', axis=1, inplace=True)
 
# CBOE Volatility Index: VIX© (VIXCLS)
# src: https://fred.stlouisfed.org/series/VIXCLS/downloaddata
vix = pd.read_csv('VIXCLS.csv')
vix = vix[vix.VALUE != "."]
vix.VALUE = vix.VALUE.astype(float)
vix['DATE'] =  pd.to_datetime(vix['DATE'], format='%Y-%m-%d')
vix.index = vix.DATE
vix.drop('DATE', axis=1, inplace=True)

Not every time a plain use of pd.read_csv function works in the way we expect from it, therefore some extra steps are needed to be implemented in order to convert .csv files content to Dataframe, e.g. a correct formatting of date (line #18-19 or #35, 44, 53), price conversion from string to float (lines #21, 34, 43, 52), replacement of index with data included in ‘Date’ column followed by its removal (e.g. lines #22-23).

As a side note, in order to download the copper data, at the website we need manually mark a whole table, paste it into Excel or Mac’s Numbers, trim if required, and save down to .csv file. The problem might appear as dates are given in “Nov 26, 2016″ format. For this case, the following trick in Python saves a lot of time:

from datetime import datetime
expr = 'Nov 26, 2016'
datetime.strptime(expr, '%b %d, %Y')
datetime.datetime(2016, 11, 26, 0, 0)

what justifies the use of %b descriptor (see more on Python’s datetime date formats and parsing here and here).

pandas’ Dataframes are fast and convenient way to concatenate multiple time-series with indexes set to calendar dates. The following step ensures all five price-series to have data points on the same dates (if given), and if any data point (for any time-series) is missing (NaN), the whole row is removed from the end Dataframe of ‘df’ thanks to action of .dropna() function:

57
58
59
60
df = pd.concat([wpi, cop, tb, usd, vix], join='outer', axis=1).dropna()
df.columns = ["WPI", "Copp", "TrB", "USD", "VIX"]
 
print(df.head(10))
              WPI   Copp   TrB      USD    VIX
2007-11-15  93.37  3.082  4.17  72.7522  28.06
2007-11-16  94.81  3.156  4.15  72.5811  25.49
2007-11-19  95.75  2.981  4.07  72.7241  26.01
2007-11-20  99.16  3.026  4.06  72.4344  24.88
2007-11-21  98.57  2.887  4.00  72.3345  26.84
2007-11-23  98.24  2.985  4.01  72.2719  25.61
2007-11-26  97.66  3.019  3.83  72.1312  28.91
2007-11-27  94.39  2.958  3.95  72.5061  26.28
2007-11-28  90.71  3.001  4.03  72.7000  24.11
2007-11-29  90.98  3.063  3.94  72.6812  23.97

Given all price-series and our model in mind, we are more interested in running the regression using daily change in the natural logarithm of crude oil, copper price, 10-yr interest rate, USD spot price, and VIX. The use of log is a common practice in finance and economics (see here why). Contrary to the daily percent change, we derive the log returns in Python in the following way:

np.random.seed(7)
df = pd.DataFrame(100 + np.random.randn(100).cumsum(), columns=['price'])
df['pct_change'] = df.price.pct_change()
df['log_ret'] = np.log(df.price) - np.log(df.price.shift(1))
 
print(df.head())
        price  pct_change   log_ret
0  101.690526         NaN       NaN
1  101.224588   -0.004582 -0.004592
2  101.257408    0.000324  0.000324
3  101.664925    0.004025  0.004016
4  100.876002   -0.007760 -0.007790

Having that, we continue coding our main model:

62
63
64
65
66
67
68
69
70
71
72
73
headers = df.columns
 
dlog = pd.DataFrame()  # log returns Dateframe
 
for j in range(df.shape[1]):
    price = df.ix[:,j]
    dl = np.log(price) - np.log(price.shift(1))
    dlog.loc[:,j] = dl
 
dlog.dropna(inplace=True)
dlog.columns = headers
print(dlog.head())
                 WPI      Copp       TrB       USD       VIX
2007-11-16  0.015305  0.023727 -0.004808 -0.002355 -0.096059
2007-11-19  0.009866 -0.057047 -0.019465  0.001968  0.020195
2007-11-20  0.034994  0.014983 -0.002460 -0.003992 -0.044417
2007-11-21 -0.005968 -0.047024 -0.014889 -0.001380  0.075829
2007-11-23 -0.003353  0.033382  0.002497 -0.000866 -0.046910

and run multiple linear regression model for in-sample data, here, selected for 2 year in-sample period between Nov 2012 and Nov 2014:

75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
dlog0 = dlog.copy()
 
# in-sample data selection
dlog = dlog[(dlog.index > "2012-11-01") & (dlog.index < "2014-11-01")]
 
# define input data
y = dlog.WPI.values
x = [dlog.Copp.values, dlog.TrB.values, dlog.USD.values, dlog.VIX.values ]
 
# Multiple Linear Regression using 'statsmodels' library
def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results
 
# display a summary of the regression
print(reg_m(y, x).summary())
                           OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.105
Model:                            OLS   Adj. R-squared:                  0.098
Method:                 Least Squares   F-statistic:                     14.49
Date:                Tue, 29 Nov 2016   Prob (F-statistic):           3.37e-11
Time:                        11:38:44   Log-Likelihood:                 1507.8
No. Observations:                 499   AIC:                            -3006.
Df Residuals:                     494   BIC:                            -2985.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0303      0.008     -3.800      0.000        -0.046    -0.015
x2            -0.3888      0.178     -2.180      0.030        -0.739    -0.038
x3             0.0365      0.032      1.146      0.252        -0.026     0.099
x4             0.2261      0.052      4.357      0.000         0.124     0.328
const      -3.394e-05      0.001     -0.064      0.949        -0.001     0.001
==============================================================================
Omnibus:                       35.051   Durbin-Watson:                   2.153
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               75.984
Skew:                          -0.392   Prob(JB):                     3.16e-17
Kurtosis:                       4.743   Cond. No.                         338.
==============================================================================
 
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is 
    correctly specified.

A complete model is given by the following parameters:

96
97
par = reg_m(y, x).params
print(par)
[ -3.02699345e-02  -3.88784262e-01   3.64964420e-02   2.26134337e-01
  -3.39418282e-05]

or more formally as:
$$
\Delta p_{oil, t} = 0.2261 \Delta p_{copp, t} + 0.0365 \Delta p_{10yr, t} -0.3888 \Delta p_{USD, t} -0.0303 \Delta p_{VIX, t}
$$ The adjusted $R^2$ is near zero and statistical significance of $\beta_2$ is more than 5%.

In order to use our model for “future” oil price prediction based on price movements in commodity (copper), long-term interest rate, USD spot price and VIX index, we generate out-of-sample data set and construct the estimation in the following way:

99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# out-of-sample log return Dataframe
dlog_oos = dlog0[dlog0.index >= '2014-11-01']
 
# what is the last WPI Crude Oil price in in-sample
last = df[df.index == "2014-10-31"].values[0][0]  # 80.53 USD/barrel
 
oilreg = pd.DataFrame(columns=["Date", "OilPrice"])  # regressed oil prices
par = reg_m(y, x).params  # parameters from regression
 
for i in range(dlog_oos.shape[0]):
    x1 = dlog_oos.iloc[i]["VIX"]
    x2 = dlog_oos.iloc[i]["USD"]
    x3 = dlog_oos.iloc[i]["TrB"]
    x4 = dlog_oos.iloc[i]["Copp"]
    w = par[0]*x1 + par[1]*x2 + par[2]*x3 + par[3]*x4 + par[4]
    if(i==0):
        oilprice = np.exp(w) * last
    else:
        oilprice = np.exp(w) * oilprice
    oilreg.loc[len(oilreg)] = [dlog_oos.index[i], oilprice]
 
oilreg.index = oilreg.Date
oilreg.drop('Date', axis=1, inplace=True)

where oilreg Dataframe stores time and regressed oil prices. Visualisation of the results

123
124
125
126
127
128
129
130
plt.figure(figsize=(10,5))
plt.plot(df.WPI, color=grey, label="WPI Crude Oil")
plt.plot(oilreg, "r", label="Predicted Oil Price")
plt.xlim(['2012-01-01', '2016-11-01'])  # in-sample & out-of-sample
plt.ylim([20, 120])
plt.ylabel("USD/barrel")
plt.legend(loc=3)
plt.grid()

delivers
unknown
Bernanke’s multiple regression model points at an idea saying: when stock traders respond to a change in oil prices, they do so not necessarily because the oil movement in consequential in itself. Our finding (for different data sets than Bernanke’s) points at lack of strong dependancy on VIX thus on traders activity. Again, as in case of one-factor regression model discussed in Section 1.1, the stability of underlying correlations with all four considered factors remains statistically variable in time.

2. Non-Linear Correlations

So far we have understood how fragile the quantification and interpretation of the linear correlation between two time-series can be. We agreed on its time-dependant character and how data sample size may influence derived results. Linear correlation is a tool that captures 1-to-1 feedback between two data sets. What if a response is lagged? We need to look for some other useful tools good enough for this challenge. In this Section we will analyse two of them.

2.1. Cross-Correlation and Cross-Bicorrelation

If a change of one quantity has a noticeable effect on another one, this can be observed with or without time delay. In the time-series analysis we talk about one time-series being lagged (or led) behind (before) the other one. Since an exisiting lag (lead) is not observed directly (a linear feedback) it introduces a non-linear relationship between two time-series.

A degree of non-linear correlation can be measured by the application of two independent methods, namely, cross-correlations and cross-bicorrelations designed to examine the data dependence between any two (three) values led/lagged by $r$ ($s$) periods. By period one should understand the sampling frequency of the underlying time-series, for example a quarter, i.e. 3 month period.

Let $x(t) \equiv \{ x_t \} = \{ x_1, x_2, …, x_N \}$ for $t=1,…,N$ represents, e.g. the WPI Crude Oil price-series, and $\{ y_t \}$ the time-series of a selected quantity. For two time-series, $x(t)$ and $y(t)$, the cross-correlation is defined as:
$$
C_{xy}(r) = (N-r)^{-1} \sum_{i=1}^{N-r} x(t_i) y(t_i + r)
$$ whereas the cross-bicorrelation is usually assumed as:
$$
C_{xyy}(r,s) = (N-m)^{-1} \sum_{i=1}^{N-m} x(t_i) y(t_i + r) y(t_i + s)
$$ where $m=\max(r,s)$ (Brooks & Hinich 1999). Both formulae measure the inner product between the individual values of two time-series. In case of cross-correlation $C_{xy}(r)$ it is given by $\langle x(t_i) y(t_i + r) \rangle$ where the value of $y(t)$ is lagged by $r$ periods. The summation, similarly as in the concept of the Fourier transform, picks up the maximum power for a unique set of lags $r$ or $(r,s)$ for $C_{xy}(r)$ and $C_{xyy}(r,s)$, respectively. The use of cross-correlation method should return the greatest value of $C_{xy}(r)$ among all examined $r$’s. Here, $1 \le r < N$ in a first approximation. It reveals a direct non-linear correlation between $x(t_i)$ and $y(t_i + r)$ $\forall i$.

The concept of cross-bicorrelation goes one step further and examines the relationship between $x(t_i)$ and $y(t_i + r)$ and $y(t_i + s)$, e.g. how the current oil price at time $t$, $x(t_i)$, affects $y$ lagged by $r$ and $s$ independent time periods. Note that cross-bicorrelation also allows for $C_{xxy}(r,s)$ form, i.e. if we require to look for relationship of the current and lagged (by $r$) values of $x$ time-series and period $y$ time-series lagged by $s$.

2.2. Statistical Tests for Cross-Correlation and Cross-Bicorrelation

The application of both tools for small data samples ($N < 50$) does not guarantee a statistically significant non-linear correlations every single time the computations are done. The need for statistical testing of the results returned by both cross-correlation and cross-bicorrelation methods has beed addressed by Hinich (1996) and Brook & Hinich (1999). For two time-series, with a weak stationarity, the null hypothesis $H_0$ for the test is that two series are independent pure white noise processes against the alternative hypothesis $H_1$ saying that some cross-covariances, $E[x(t_i) y(t_i + r)]$, or cross-bicovariances, $E[x(t_i) y(t_i + r) y(t_i + s)]$, are nonzero. The test statistics, respectively for cross-correlation and cross-bicorrelation, are given by:
$$
H_{xy}(N) = \sum_{r=1}^{L} (N-L) C_{xy}^2(r) = (N-L) \sum_{r=1}^{L} C_{xy}^2(r)
$$ and
$$
H_{xyy}(N) = \sum_{s=-L}^{L} ‘ \sum_{r=1}^{L} (N-m) C_{xyy}^2(r,s) \ \ \ (‘-s \ne -1, 1, 0)
$$ where it has been shown by Hinich (1996) that:
$$
H_{xy}(N) \sim \chi^2_L
$$ and
$$
H_{xyy}(N) \sim \chi^2_{(L-1)(L/2)}
$$ where $\sim$ means are asymptotically distributed with the corresponding degrees for $N \rightarrow \infty$ and $m = \max(r, s)$ as previously. For both $H_{xy}(N)$ and $H_{xyy}(N)$ statistics, the number of lags $L$ is specified as $L = N^b$ where $0 < b < 0.5$. Based on the results of the Monte-Carlo simulations, Hinich & Parrerson (1995) recommended the use of $b=0.4$ in order to maximise the power of test while ensuring a valid approximation to the asymptotic theory. Further down, we compute $H_{xy}(N)$ and $H_{xyy}(N)$ statistics for every cross-correlation and cross-bicorrelation and determine the corresponding significance assuming the confidence level of 90% ($\alpha = 0.1$) if: $$ \mbox{p-value} < \alpha $$ where $$ \mbox{p-value} = 1 - \mbox{Pr} \left[ \chi^2_{\mbox{dof}} < H_{x(y)y} | H_1 \right] \ . $$ Given that, any resultant values of cross-correlations and cross-bicorrelations one can assume as statistically significant if p-value $< \alpha$. A recommended number of lags to be investigated is $L = N^{0.4}$ where $N$ is the length of $x(t)$ and $y(t)$ time-series. Employing floor rounding for $L$ we obtain the following relationship between $N$ and $L$ for the sample sizes of $N \le 50$:

L = []
for N in range(1, 51):
    L.append(np.floor(N**0.4))
 
fig, ax = plt.subplots(figsize=(8,3))
plt.plot(np.arange(1, 51), L, '.')
plt.xlabel("N")
plt.ylabel("L")
ax.margins(x=0.025, y=0.05) # add extra padding
ax.tick_params(axis='y',which='minor',left='off') # without minor yticks
plt.grid(b=True, which='minor', color='k', linestyle=':', alpha=0.3)
plt.minorticks_on()

unknown-2

2.3. Cross-Correlation and Cross-Bicorrelation in Python

Let’s translate cross-correlation to Python language and run a simple test for a random time-series:

# cross-correlation
def Cxy(x, y, r, N):
    z = 0
    for i in range(0, N-r-1):
        z += x[i] * y[i+r]
    z /= (N-r)
    return z
 
# test statistic for cross-correlation
def Hxy(x, y, L, N):
    z = 0
    for r in range(1, L+1):
        z += Cxy(x, y, r, N)**2
    z *= (N - L)   
    return z

where we defined two functions deriving $C_{xy}(r)$ and $H_{xy}(N)$, respectively. Now, let’s consider some simplified time-series of $N=16$, both normalised to be $\sim N(0,1)$:

# sample data
y = np.array([0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0])
x = np.array([1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0])
 
# normalise such that x, y ~ N(0,1)
x = (x - np.mean(x))/np.std(x, ddof=1)
y = (y - np.mean(y))/np.std(y, ddof=1)

As one can see, there is expected strong cross-correlation for lag of $r = 2$ (strongest) with $r = 1$ to be detected too. We verify the overall statistical significance of cross-correlations for given time-series in the following way:

N = len(x)
L = int(np.floor(N**0.4))
 
for r in range(1, L+1):
    print("r =%2g\t Cxy(r=%g) = %.4f" % (r, r, Cxy(x, y, r)))
 
dof = L    
pvalue = 1 - scipy.stats.chi2.cdf(Hxy(x, y, L, N), dof)
alpha = 0.1
 
print("Hxy(N) = %.4f" % Hxy(x, y, L, N))
print("p-value = %.4f" % pvalue)
print("H0 rejected in favour of H1 at %g%% c.l.: %s" % 
      (100*(1-alpha), pvalue<alpha))

confirming both our expectations and significance of detected cross-correlations at 90% confidence level:

r = 1	 Cxy(r=1) = 0.0578
r = 2	 Cxy(r=2) = 0.8358
r = 3	 Cxy(r=3) = -0.3200
Hxy(N) = 10.4553
p-value = 0.0151
H0 rejected in favour of H1 at 90% c.l.: True

For cross-bicorrelation we have:

# cross-bicorrelation
def Cxyy(x, y, r, s, N):
    z = 0
    m = np.max([r, s])
    for i in range(0, N-m-1):
        z += x[i] * y[i+r] * y[i+s]
    z /= (N-m)
    return z
 
# test statistic for cross-bicorrelation
def Hxyy(x, y, L, N):
    z = 0
    for s in range(2, L+1):
        for r in range(1, L+1):
            m = np.max([r, s])
            z += (N-m) * Cxyy(x, y, r, s, N)**2
    return z    
 
 
# sample data
y = np.array([0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0])
x = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0])
 
# normalise such that x, y ~ N(0,1)
x = (x - np.mean(x))/np.std(x, ddof=1)
y = (y - np.mean(y))/np.std(y, ddof=1)
 
N = len(x)  # len(x) should be equal to len(y)
L = int(np.floor(N**0.4))
 
for r in range(0, L+1):
    for s in range(r+1, L+1):
        print("r =%2g, s =%2g\t Cxyy(r=%g, s=%g) = %.5f" % 
             (r, s, r, s, Cxyy(x, y, r, s, N)))
 
dof = (L-1)*(L/2)
pvalue = 1 - scipy.stats.chi2.cdf(Hxy(x, y, L, N), dof)
alpha = 0.1
 
print("Hxyy(N) = %.4f" % Hxyy(x, y, L, N))
print("p-value = %.4f" % pvalue)
print("H0 rejected in favour of H1 at %g%% c.l.: %s" % 
      (100*(1-alpha), pvalue<alpha))

where for a new sample data we expect to detect two strongest cross-bicorrelations for pairs of lags $(r,s)$ equal $(2,3)$ and $(1,2)$, respectively. By running the code we obtain indeed that:

r = 0, s = 1	 Cxyy(r=0, s=1) = -0.18168
r = 0, s = 2	 Cxyy(r=0, s=2) = -0.23209
r = 0, s = 3	 Cxyy(r=0, s=3) = 0.05375
r = 1, s = 2	 Cxyy(r=1, s=2) = 0.14724
r = 1, s = 3	 Cxyy(r=1, s=3) = -0.22576
r = 2, s = 3	 Cxyy(r=2, s=3) = 0.18276
Hxyy(N) = 5.3177
p-value = 0.0833
H0 rejected in favour of H1 at 90% c.l.: True

With these two examples we showed both cross-correlation and cross-bicorrelation methods in action. We also pointed at the important fact of an existing dependency between the length of time-series and the expected number of lags to be examined. The longer time-series the higher number of degrees-of-freedom.

Given that state of the play, there exist two independent and appealing ways for the application of cross-correlation and cross-bicorrelation in finance: (1) the first one regards the examination of time-series with a small number of data points (e.g. $N \le 50$); (2) alternatively, if we deal with a long ($N \ge 1000$) time-series, the common practice is to divide them into $k$ chunks (non-overlapping windows) of length at least $N_i$ of $50$ to $100$ where $i=1,…,k$ and compute cross-correlation and cross-bicorrelation metrics for every data window separately. In consequence, within this approach, we gain an opportunity to report on time-period dependent data segment with statistically signifiant (or not) non-linear correlations! (see e.g. Coronado et alii 2015).

In the Section that follows, we solely focus on the application of the first approach. Rewarding in many aspects if you’re open-mined.

2.4. Detecting Non-Linearity between Oil Prices and Stock Fundamentals

Since looking for a direct correlation between crude oil prices and stocks can be regarded as the first approach to the examination of mutual relationships, it’s not sexy; not any more. With cross-correlation and cross-bicorrelation methods you can step up and break the boredom.

The stock fundamentals consist a broad space of parameters that actually deserve to be cross-correlated with the oil prices. It is much more intuitive that any (if any) effect of rising or falling crude oil prices may affect different companies with various time lags. Moreover, for some businesses we might detect a non-linear relationships existing only between oil and company’s e.g. revenues whereas for others, changing oil prices might affect more than one factor. If the patterns is present indeed, with the application of cross-correlation and cross-bicorrelation methods, you gain a brand new tool for data analysis. Let’s have a look at a practical example.

As previously, we will need both the WPI Crude Oil price-series (see Section 1.1) and access to a large database containing company’s fundamentals over a number of quarters or years. For the latter let me use a demo version of Quandl.com‘s Zacks Fundamentals Collection C available here (download Zacks Fundamentals Expanded ZACKS-FE.csv file). It contains 852 rows of data for 30 selected companies traded at US stock markets (NYSE, NASDAQ, etc.). This sample collection covers data back to 2011 i.e. ca. 22 to 23 quarterly reported financials. There is over 620 fundamentals to select from. Their full list you can find here (download the ZFC Definitions csv file). Let’s put it all in a new Python listing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Non-Linear Cross-Bicorrelations between the Oil Prices and Stock Fundamentals
# (c) 2016 QuantAtRisk.com, by Pawel Lachowicz
 
import numpy as np
import pandas as pd
import scipy.stats
import datetime
from matplotlib import pyplot as plt
 
%matplotlib inline
 
grey = 0.7, 0.7, 0.7
 
import warnings
warnings.filterwarnings('ignore')
 
 
# WTI Crude Oil price-series
# src: https://fred.stlouisfed.org/series/DCOILWTICO/downloaddata
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d')
wpi = pd.read_csv("DCOILWTICO.csv", parse_dates=['DATE'], date_parser=dateparse)
wpi = wpi[wpi.VALUE != "."]
wpi.VALUE = wpi.VALUE.astype(float)
 
# Zacks Fundamentals Collection C (Fundamentals Expanded)
# src: https://www.quandl.com/databases/ZFC
df = pd.read_csv('ZACKS-FE.csv')
 
col = ['Ticker', 'per_end_date', 'per_code', 'Revenue', "EBIT", 
          "GrossProfit", "CAPEX"]
data = pd.DataFrame(columns=col)  # create an empty DataFrame
 
tickers = ["XOM", "WMT", "BA", "AAPL", "JNJ"]
 
for i in range(df.shape[0]):
    ticker = df.iloc[i]["ticker"]
    pc = df.iloc[i]["per_code"]
    # extract only reported fundamentals for quarters "QR0 and all "QR-"
    # in total 22 to 23 rows of data per stock
    if((pc[0:3] == "QR-") | (pc[0:3] == "QR0")) and (ticker in tickers):
        data.loc[len(data)] = [df.iloc[i]["ticker"],
                               df.iloc[i]["per_end_date"],
                               df.iloc[i]["per_code"],
                               df.iloc[i]["tot_revnu"],
                               df.iloc[i]["ebit"],
                               df.iloc[i]["gross_profit"],
                               df.iloc[i]["cap_expense"]] 
 
print(data.head(25))

where we created a new Dataframe, data, storing Total Revenue, EBIT, Gross Profit (revenue from operations less associated costs), and CAPEX (capital expenditure) for a sample of 5 stocks (XOM, WMT, BA, AAPL, and JNJ) out of 30 available in that demo collection. Our goal here is just to show a path for a possible analysis of all those numbers with, kept in mind, an encouragement to build a bigger tool for stock screening based on detection of non-linear correlations amongst selected fundamentals and oil prices.

First 25 lines of data Dataframe are:

   Ticker per_end_date per_code  Revenue     EBIT  GrossProfit    CAPEX
0    AAPL   2011-03-31    QR-22  24667.0   7874.0      10218.0  -1838.0
1    AAPL   2011-06-30    QR-21  28571.0   9379.0      11922.0  -2615.0
2    AAPL   2011-09-30    QR-20  28270.0   8710.0      11380.0  -4260.0
3    AAPL   2011-12-31    QR-19  46333.0  17340.0      20703.0  -1321.0
4    AAPL   2012-03-31    QR-18  39186.0  15384.0      18564.0  -2778.0
5    AAPL   2012-06-30    QR-17  35023.0  11573.0      14994.0  -4834.0
6    AAPL   2012-09-30    QR-16  35966.0  10944.0      14401.0  -8295.0
7    AAPL   2012-12-31    QR-15  54512.0  17210.0      21060.0  -2317.0
8    AAPL   2013-03-31    QR-14  43603.0  12558.0      16349.0  -4325.0
9    AAPL   2013-06-30    QR-13  35323.0   9201.0      13024.0  -6210.0
10   AAPL   2013-09-30    QR-12  37472.0  10030.0      13871.0  -8165.0
11   AAPL   2013-12-31    QR-11  57594.0  17463.0      21846.0  -1985.0
12   AAPL   2014-03-31    QR-10  45646.0  13593.0      17947.0  -3367.0
13   AAPL   2014-06-30     QR-9  37432.0  10282.0      14735.0  -5745.0
14   AAPL   2014-09-30     QR-8  42123.0  11165.0      16009.0  -9571.0
15   AAPL   2014-12-31     QR-7  74599.0  24246.0      29741.0  -3217.0
16   AAPL   2015-03-31     QR-6  58010.0  18278.0      23656.0  -5586.0
17   AAPL   2015-06-30     QR-5  49605.0  14083.0      19681.0  -7629.0
18   AAPL   2015-09-30     QR-4  51501.0  14623.0      20548.0 -11247.0
19   AAPL   2015-12-31     QR-3  75872.0  24171.0      30423.0  -3612.0
20   AAPL   2016-03-31     QR-2  50557.0  13987.0      19921.0  -5948.0
21   AAPL   2016-06-30     QR-1  42358.0  10105.0      16106.0  -8757.0
22   AAPL   2016-09-30      QR0  46852.0  11761.0      17813.0 -12734.0
23     BA   2011-03-31    QR-22  14910.0   1033.0       2894.0   -417.0
24     BA   2011-06-30    QR-21  16543.0   1563.0       3372.0   -762.0

An exemplary visualisation of both data sets for selected company (here: Exxon Mobil Corporation, XOM, engaged in the exploration and production of crude oil and natural gas, manufacturing of petroleum products, and transportation and sale of crude oil, natural gas and petroleum products) we obtain with:

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
fig, ax1 = plt.subplots(figsize=(10,5))
plt.plot(wpi.DATE, wpi.VALUE, color=grey, label="WPI Crude Oil (daily)")
plt.grid(True)
plt.legend(loc=1)
plt.ylabel("USD per barrel")
 
selection = ["XOM"]
 
ax2 = ax1.twinx()
plt.ylabel("1e6 USD")
for ticker in selection:
    tmp = data[data.Ticker == ticker]
    lab = ticker
    tmp['per_end_date'] =  pd.to_datetime(tmp['per_end_date'], 
                                                        format='%Y-%m-%d')
    plt.plot(tmp.per_end_date, tmp.Revenue, '.-', label=lab + ": Revenue")
    plt.plot(tmp.per_end_date, tmp.EBIT, '.-g', label=lab + ": EBIT")
    plt.plot(tmp.per_end_date, tmp.GrossProfit, '.-m', label=lab + ": 
                                                            Gross Profit")
    plt.plot(tmp.per_end_date, tmp.CAPEX, '.-r', label=lab + ": CAPEX")
    plt.legend(loc=3)

unknown-2
Every company has different periods of time (per_end_date) when they report their financials. That is why we need to make sure that in the process of comparison of selected fundamentals with oil prices, the oil price is picked up around the date defined by per_end_date. Let me use $\pm 5$ day average around those points:

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# cross-correlation
def Cxy(x, y, r, N):
    z = 0
    for i in range(0, N-r-1):
        z += x[i] * y[i+r]
    z /= (N-r)
    return z
 
# test statistic for cross-correlation
def Hxy(x, y, L, N):
    z = 0
    for r in range(1, L+1):
        z += Cxy(x, y, r, N)**2
    z *= (N - L)   
    return z
 
# cross-bicorrelation
def Cxyy(x, y, r, s, N):
    z = 0
    m = np.max([r, s])
    for i in range(0, N-m-1):
        z += x[i] * y[i+r] * y[i+s]
    z /= (N-m)
    return z
 
def Hxyy(x, y, L, N):
    z = 0
    for s in range(2, L+1):
        for r in range(1, L+1):
            m = np.max([r, s])
            z += (N-m) * Cxyy(x, y, r, s, N)**2
    return z
 
 
fig, ax1 = plt.subplots(figsize=(10,5))
plt.plot(wpi.DATE, wpi.VALUE, color=grey, label="WPI Crude Oil (daily)")
plt.grid(True)
plt.legend(loc=1)
plt.ylabel("USD per barrel")
 
selection = ["XOM"]
 
ax2 = ax1.twinx()
plt.ylabel("1e6 USD")
for ticker in selection:
    tmp = data[data.Ticker == ticker]
    lab = ticker
    tmp['per_end_date'] =  pd.to_datetime(tmp['per_end_date'], 
                                               format='%Y-%m-%d')
    plt.plot(tmp.per_end_date, tmp.Revenue, '.-', label=lab + ": Revenue")
    plt.plot(tmp.per_end_date, tmp.EBIT, '.-g', label=lab + ": EBIT")
    plt.plot(tmp.per_end_date, tmp.GrossProfit, '.-m', label=lab + ": 
                                                            Gross Profit")
    plt.plot(tmp.per_end_date, tmp.CAPEX, '.-r', label=lab + ": CAPEX")
    plt.legend(loc=3)
 
    col = ['per_end_date', 'avgWPI']
    wpi2 = pd.DataFrame(columns=col)  # create an empty DataFrame
 
    for i in range(tmp.shape[0]):
        date = tmp.iloc[i]["per_end_date"]
        date0 = date
        date1 = date + datetime.timedelta(days=-5)
        date2 = date + datetime.timedelta(days=+5)
        wpiavg = wpi[(wpi.DATE >= date1) & (wpi.DATE <= date2)]
        avg = np.mean(wpiavg.VALUE)
        wpi2.loc[len(wpi2)] = [date0, avg]
 
    # plot quarterly averaged oil price-series
    plt.sca(ax1)  # set ax1 axis for plotting
    plt.plot(wpi2.per_end_date, wpi2.avgWPI, '.-k')

unknown-3
where a black line denotes quarterly changing WPI Crude Oil prices stored now in a new Dataframe of wpi2.

Let’s leave only XOM in selection list variable. After line #140 the Dataframe of tmp stores XOM’s selected fundamentals data. In what follows we will use both Dateframes (tmp and wpi2) to look for non-linear correlations as the final goal of this investigation:

142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
    # select some factors
    fundamentals = ["Revenue", "EBIT", "GrossProfit", "CAPEX"]
 
    # compute for them cross-correlations and cross-bicorrelations
    # and display final results
    #
    for f in fundamentals:
        print("%s: %s\n" % (ticker, f))
        # input data
        x = wpi2.avgWPI.values
        y = tmp[f].values
 
        # normalised time-series
        x = (x - np.mean(x))/np.std(x, ddof=1)
        y = (y - np.mean(y))/np.std(y, ddof=1)
 
        N = len(x)  # len(x) should be equal to len(y)
        L = int(np.floor(N**0.4))
 
        print("Cross-Correlation")
        for r in range(1, L+1):
            print("  r =%2g\t Cxy(r=%g) = %.4f" % 
                  (r, r, Cxy(x, y, r, N)))
 
        dof = L
        pvalue = 1 - scipy.stats.chi2.cdf(Hxy(x, y, L, N), dof)
        alpha = 0.1
 
        print("  Hxy(N) = %.4f" % Hxy(x, y, L, N))
        print("  p-value = %.4f" % pvalue)
        print("  H0 rejected in favour of H1 at %g%% c.l.: %s" % 
                                      (100*(1-alpha), pvalue<alpha))
 
        print("Cross-Bicorrelation")
        for r in range(0, L+1):
            for s in range(r+1, L+1):
                print("  r =%2g, s =%2g\t Cxyy(r=%g, s=%g) = %.5f" % 
                     (r, s, r, s, Cxyy(x, y, r, s, N)))
 
        dof = (L-1)*(L/2)
        pvalue = 1 - scipy.stats.chi2.cdf(Hxy(x, y, L, N), dof)
 
        print("  Hxyy(N) = %.4f" % Hxyy(x, y, L, N))
        print("  p-value = %.4f" % pvalue)
        print("  H0 rejected in favour of H1 at %g%% c.l.: %s\n" % 
                                       (100*(1-alpha), pvalue<alpha))

Our Python code generates for XOM the following output:

XOM: Revenue
 
Cross-Corelation
  r = 1	 Cxy(r=1) = 0.7869
  r = 2	 Cxy(r=2) = 0.6239
  r = 3	 Cxy(r=3) = 0.4545
  Hxy(N) = 24.3006
  p-value = 0.0000
  H0 rejected in favour of H1 at 90% c.l.: True
Cross-Bicorrelation
  r = 0, s = 1	 Cxyy(r=0, s=1) = -0.44830
  r = 0, s = 2	 Cxyy(r=0, s=2) = -0.30186
  r = 0, s = 3	 Cxyy(r=0, s=3) = -0.19315
  r = 1, s = 2	 Cxyy(r=1, s=2) = -0.36678
  r = 1, s = 3	 Cxyy(r=1, s=3) = -0.22161
  r = 2, s = 3	 Cxyy(r=2, s=3) = -0.24446
  Hxyy(N) = 10.3489
  p-value = 0.0000
  H0 rejected in favour of H1 at 90% c.l.: True
 
XOM: EBIT
 
Cross-Correlation
  r = 1	 Cxy(r=1) = 0.7262
  r = 2	 Cxy(r=2) = 0.5881
  r = 3	 Cxy(r=3) = 0.3991
  Hxy(N) = 20.6504
  p-value = 0.0001
  H0 rejected in favour of H1 at 90% c.l.: True
Cross-Bicorrelation
  r = 0, s = 1	 Cxyy(r=0, s=1) = -0.39049
  r = 0, s = 2	 Cxyy(r=0, s=2) = -0.25963
  r = 0, s = 3	 Cxyy(r=0, s=3) = -0.17480
  r = 1, s = 2	 Cxyy(r=1, s=2) = -0.28698
  r = 1, s = 3	 Cxyy(r=1, s=3) = -0.18030
  r = 2, s = 3	 Cxyy(r=2, s=3) = -0.21284
  Hxyy(N) = 6.8759
  p-value = 0.0001
  H0 rejected in favour of H1 at 90% c.l.: True
 
XOM: GrossProfit
 
Cross-Correlation
  r = 1	 Cxy(r=1) = 0.7296
  r = 2	 Cxy(r=2) = 0.5893
  r = 3	 Cxy(r=3) = 0.3946
  Hxy(N) = 20.7065
  p-value = 0.0001
  H0 rejected in favour of H1 at 90% c.l.: True
Cross-Bicorrelation
  r = 0, s = 1	 Cxyy(r=0, s=1) = -0.38552
  r = 0, s = 2	 Cxyy(r=0, s=2) = -0.24803
  r = 0, s = 3	 Cxyy(r=0, s=3) = -0.17301
  r = 1, s = 2	 Cxyy(r=1, s=2) = -0.28103
  r = 1, s = 3	 Cxyy(r=1, s=3) = -0.17393
  r = 2, s = 3	 Cxyy(r=2, s=3) = -0.19780
  Hxyy(N) = 6.2095
  p-value = 0.0001
  H0 rejected in favour of H1 at 90% c.l.: True
 
XOM: CAPEX
 
Cross-Correlation
  r = 1	 Cxy(r=1) = -0.2693
  r = 2	 Cxy(r=2) = -0.3024
  r = 3	 Cxy(r=3) = -0.2333
  Hxy(N) = 4.3682
  p-value = 0.2244
  H0 rejected in favour of H1 at 90% c.l.: False
Cross-Bicorrelation
  r = 0, s = 1	 Cxyy(r=0, s=1) = 0.00844
  r = 0, s = 2	 Cxyy(r=0, s=2) = -0.10889
  r = 0, s = 3	 Cxyy(r=0, s=3) = -0.14787
  r = 1, s = 2	 Cxyy(r=1, s=2) = -0.10655
  r = 1, s = 3	 Cxyy(r=1, s=3) = -0.16628
  r = 2, s = 3	 Cxyy(r=2, s=3) = -0.08072
  Hxyy(N) = 6.4376
  p-value = 0.2244
  H0 rejected in favour of H1 at 90% c.l.: False

A quick analysis points at detection of significant non-linear correlations (at 90% confidence level) between WPI Crude Oil and Revenue, EBIT, and GrossProfit for lag $r=1$ (based on cross-correlations) and for lags $(r=0, s=1)$ (based on cross-bicorrelations). There is no significant non-linear relationship between XOM’s CAPEX and oil prices.

Interestingly, rerunning the code for Boeing Company (NYSE ticker: BA) reveals no significant cross-(bi)correlations between the same factors and oil. For example, Boeing’s revenue improves year-over-year,

unknown-4

but it appears to have any significant non-linear link to the oil prices whatsoever:

BA: Revenue
 
Cross-Correlation
  r = 1	 Cxy(r=1) = -0.3811
  r = 2	 Cxy(r=2) = -0.3216
  r = 3	 Cxy(r=3) = -0.1494
  Hxy(N) = 5.4200
  p-value = 0.1435
  H0 rejected in favour of H1 at 90% c.l.: False
Cross-Bicorrelation
  r = 0, s = 1	 Cxyy(r=0, s=1) = 0.11202
  r = 0, s = 2	 Cxyy(r=0, s=2) = 0.03588
  r = 0, s = 3	 Cxyy(r=0, s=3) = -0.08083
  r = 1, s = 2	 Cxyy(r=1, s=2) = 0.01526
  r = 1, s = 3	 Cxyy(r=1, s=3) = 0.00616
  r = 2, s = 3	 Cxyy(r=2, s=3) = -0.07252
  Hxyy(N) = 0.3232
  p-value = 0.1435
  H0 rejected in favour of H1 at 90% c.l.: False

The most interesting aspect of cross-correlation and cross-bicorrelation methods applied in oil vs. stock fundamentals research regards new opportunities to discover correlations for stocks that so far have been completely ignored (or not suspected) to have anything in common with oil price movements in the markets. Dare to go this path. I hope you will find new assets to invest in. The information is there. Now you are equipped with new tools. Use them wisely. Retire early.

REFERENCES
    Bernanke B. S., 2016, The relationship between stocks and oil prices
    Brooks C., Hinich M. J., 1999, Cross-correlations and cross-bicorrelations in Sterling
          exchange rates
(pdf)
    Coronado et al., 2015, A study of co-movements between USA and Latin American stock
          markets: a cross-bicorrelations perspective
(pdf)
    Hamilton J., 2014, Oil prices as an indicator of global economic conditions
    Hinich M.J., 1996. Testing for dependence in the input to a linear time series model.
          Journal of Nonparametric Statistics 6, 205–221.
    Hinich M.J., Patterson D.M., 1995, Detecting epochs of transient dependence in white
          noise
, Mimeo, University of Texas at Austin

DOWNLOADS
    COPPER.csv, DCOILWTICO.csv, DGS10.csv, DTWEXM.csv, VIXCLS.csv, ZACKS-FE.csv

  • http://riskmanagementguru.com Risk Management Guru

    Hi Pawel,

    Very interesting and complete. Thanks for this contribution.

Contact Form Powered By : XYZScripts.com