A common task in finance is forecasting values. There are several methods for creating forecasts such as ARIMA, Bayesian Structural Time Series, and simulation. Monte Carlo is a simulation technique based on generating random walks over a period of time.
In this tutorial, we will:
- fetch Natural Gas prices
- Simulate future price movements using Monte Carlo
- Plot the simulated future paths
- Calculate future values using ARIMA and Bayesian Timeseries
We start by importing the modules we will need. We are going to access the price of natural gas from Financial Modeling Prep, a web API. You need an API key from them to pull the data.
# getting historical data for US Natural Gas prices.
# This code calls the API and transforms the result into a DataFrame.
import numpy as np
np.random.seed(3363)
import pandas as pd
from scipy.stats import norm
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
ticker = "NGUSD" #US natural gas
base = 'https://financialmodelingprep.com/api/v3/'
key = YOUR_KEY
target = "{}historical-price-full/{}?apikey={}".format(base, ticker, key)
df = pd.read_json(target)
df = pd.json_normalize(df['historical'])
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
# saving the file
df.to_csv('data/NGUSD data.csv')
df.head()
We pull the data and save it as an CSV so we can use it later. Let's plot the data so we can see how the price has changed over time.
#Plot of asset historical closing price
df['adjClose'].plot(figsize=(10, 6), title = "Price of {} from {} to {}".format(ticker, df.index.min(), df.index.max()))
Geometric Brownian Motion
We want to simualte future values for the natrual gas prices based on the historic values. To make this more relaistic, we constrain the amount the price can change in a period to the mean of the historic price change with some random shock. We assume the log of the returns (percent changes) are normally distributed. We also assume the market is efficient.
The formula for the change in price between periods is the price of the stock in t0 multiplied by the expected drift (average change in price) plus an exogenous shock.
pred_end_date = datetime.datetime(2023, 7, 20)
forecast_dates = [d if d.isoweekday() in range(1, 6) else np.nan for d in pd.date_range(df.index.max(), pred_end_date)]
intervals = len(forecast_dates)
iterations = 10000
#Preparing log returns from data
log_returns = np.log(1 + df['adjClose'].pct_change())
#Setting up drift and random component in relation to asset data
u = log_returns.mean()
var = log_returns.var()
drift = u - (0.5 * var)
stdev = log_returns.std()
daily_returns = np.exp(drift + stdev * norm.ppf(np.random.rand(intervals, iterations)))
#Takes last data point as startpoint point for simulation
S0 = df['adjClose'].iloc[-1]
price_list = np.zeros_like(daily_returns)
price_list[0] = S0
#Applies Monte Carlo simulation in asset
for t in range(1, intervals):
price_list[t] = price_list[t - 1] * daily_returns[t]
forecast_df = pd.DataFrame(price_list)
forecast_df.plot(figsize=(10,6), legend=False, title = "{} Simulated Future Paths".format(iterations))
We run the simulation 10,000 times. We could run it more but this is good enough for now. Let's plot the end values. These are the expected values for the price of natural gas in 14 periods (in this case 14 days).
# Plotting with a histogram
x = forecast_df.values[-1]
sigma = np.std(x)
mu = np.mean(x)
num_bins = 15
fig, ax = plt.subplots()
# the histogram of the data
n, bins, patches = ax.hist(x, num_bins, density=1, alpha=.75)
# add a 'best fit' line
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))
ax.plot(bins, y, '--')
ax.axvline(np.mean(x), color='r')
ax.axvline(mu+sigma*1.96, color='g', ls='--')
ax.axvline(mu-sigma*1.96, color='g', ls='--')
ax.axvline(S0)
ax.set_xlabel('Predicted Price on {}'.format(pred_end_date))
ax.set_ylabel('Probability density')
ax.set_title(r'Histogram of {ticker}: $\mu={mu:.02f}$, $\sigma={sigma:.02f}$'.format(ticker = ticker, mu=mu, sigma=sigma))
# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()
This histogram shows us the mean expected value and the range of expected values in 2 weeks. Based on this we see that it is possible to have a price of $5 but that is unlikely.
More advanced graphing
The data frame we created for the simulations held each run as a separate column. This is called wide
format which is useful for some analytics tasks, but is not optimal for visualization. We use a technique called pd.melt
to collages the 1000 columns into three. This new format, called long
, is aligned with the principles of tidy data
.
Plotly
Plotly is a high level library for interactive visualization. It is a "declarative" library where we say what we want, not how to produce what we want. Notice that the code for this visualization is much smaller than the code for Matplotlib, and yet, the result is much better.
forecast_df['date'] = [df.index.max()+pd.Timedelta(days=i) for i in forecast_df.index]
forecast_df.set_index('date', inplace=True)
df['Source'] = 'Actual'
forecast_df['Source'] = 'Forecast'
result = forecast_df.append(df[['adjClose', 'Source']], sort=False)
r = result.reset_index()
r = pd.melt(r, id_vars=['Source', 'date'])
Autoregressive integrated moving average (Arima)
ARIMA is a simple way to forecast future values using statistics. StatsModels has a nice ARIMA tool built in and we can use that to predict future values.
from sklearn.metrics import mean_squared_error
from math import sqrt
from statsmodels.tsa.arima.model import ARIMA
# fit ARIMA model
model = ARIMA(df['adjClose'], order=(5,1,0))
model_fit = model.fit()
# summary of fit model
print(model_fit.summary())
# line plot of residuals
residuals = pd.DataFrame(model_fit.resid)
residuals.plot()
plt.show()
# density plot of residuals
residuals.plot(kind='kde')
plt.show()
# summary stats of residuals
print(residuals.describe())
series = df['adjClose']
series.index = df['adjClose'].index
# split into train and test sets
X = series.values
size = int(len(X) * 0.66)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
model = ARIMA(history, order=(5,1,0))
model_fit = model.fit()
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
#print('predicted=%f, expected=%f' % (yhat, obs))
# evaluate forecasts
rmse = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)
# plot forecasts against actual outcomes
plt.plot(test)
plt.plot(predictions, color='red')
plt.show()
The red line is the predicted values and the blue like (barely visible) are the actual values.
Prophet
Prophet is a Bayesian time series method implemented by Facebook. It allows us to use prior info to predict future values. The graph shows the actual values (dots) and the predicted range (light blue).
! pip install -q prophet
from prophet import Prophet
# data must be formated for Prophet
[X, Y] = df.index, df['adjClose']
p = pd.DataFrame(Y, X)
p.reset_index(inplace=True)
p.rename(columns={'date': 'ds', 'adjClose': 'y'}, inplace=True)
m = Prophet()
m.fit(p)
future = m.make_future_dataframe(periods=5)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
fig1 = m.plot(forecast)