Notes for Forecasting

https://www.otexts.org/fpp/

packages

  • forecast
  • tseries
  • fma : ts data
  • expsmooth: more ts data
  • lmtest: some regression functions

data

beer <- window(ausbeer, start = 1992)

Simple forecasting methods

Average method:

meanf(x, h = 20)
\[y\_{T+h|T} = (y\_1 + ... + y\_T)/T\]

Naive method

naive(x, h = 20)
rwf(x, h= 20)

future equals the last one

\[y\_{T+h|T} = y\_T\]

seasonal naive method

snaive(x, h = 20)
\[y\_{T+h|T} = y\_{T+h-km}\]

Drift method:

Equivalent to extrapolating a line drawn between first and last observations.

rwf(x, drift = TRUE, h = 20)
\[y\_{T+h|T} = y\_T + h/(T-1)*(y\_T - y\_1)\]

Measures of forecast accuracy

\begin{eqnarray} MAE & = & T^{-1} \sum_{t=1}^{T} abs(y_t - yhat(t|t-1)) \
MSE & = & T^{-1} \sum_{t=1}^{T} (y_t - yhat(t|t-1))^2 \
MAPE & = & 100T^{-1} \sum_{t=1}^{T} |(y_t - yhat(t|t-1)|/|y_t| \end{eqnarray}

MAPE is scale independent.

Mean absolute scaled error (MASE)

Training set vs test set

80% vs 20%

beer3 <- window(ausbeer,start=1992,end=2005.99)
beer4 <- window(ausbeer,start=2006)
fit1 <- meanf(beer3,h=20)
fit2 <- rwf(beer3,h=20)
accuracy(fit1,beer4)
accuracy(fit2,beer4)

Accuracy measures computed for errors in test set only.

Time series graphics

time plot: plot, plot.ts
seasonal: seasonplot(beer, year.labels=TRUE)
seasonal subseries: monthplot
lag plot: lag.plot(beer, lags = 9)
acf: Acf

Autocorrelation

\begin{eqnarray} c_k & = 1/T * \sum_{t = k+1}^T (y_t - ybar)(y_{t-k} - ybar) \
r_k & = c_k /c_0 \end{eqnarray}

Can check $r_1$ up to $r_k$ to detect seasonal pattern

Acf(beer)

Slowly decaying ACF indicates trend

ACF peaks at lags 12, 24, 36 indicate seasonality of length 12

White Noise

Example of pigs slaughtered:

  • Difficult to detect pattern in time plot.
  • ACF shows some significant autocorrelation at lags 1, 2, and 3.
  • r12 relatively large although not significant. This may indicate some slight seasonality.

These show the series is not a white noise series.

Dow-Jones naive forecasts revisited

\[yhat\_{t|t-1} = y\_{t-1}\] \[e\_t = y\_t - y\_{t-1}\]

check trace plot of $e_t$ and ACF

Pormanteau tests

Box-Pierce test

\[Q = T \sum\_{k=1}^h r\_k^2\]
  • h=10 for non-seasonal data, $h=2m$ for seasonal data, is max lag being considered
  • T is number of observations
  • if $r$ close to zero, then Q will be small
  • if some $r$ is large, then Q will be large

Ljung-Box test

\[Q^* = T(T+2) \sum\_{k=1}^h (T-k)^{-1} r\_k^2\]

better performance, especially in small samples.

If white noise, $Q^*$ has chi-square distribution with $(h-K)$ df where $K$ is number of parameters in model(for raw data, K=0)

Box.test(res, lag = 10, fitdf = 0)
Box.test(res, lag = 10, fitdf = 0, type = "Lj")

test residual

beer <- window(ausbeer,start=1992)
fc <- snaive(beer)
res <- residuals(fc)
Acf(res)
Box.test(res, lag=8, fitdf=0, type="Lj")

Time series decomposition

If the variation due to seasonality is not of primary interest, the seasonally adjusted series can be useful. Consequently, employment data (and many other economic series) are usually seasonally adjusted.

\[Y\_t = f(S\_t, T\_t, E\_t)\]

$S_t$: seasonal component

$T_t$: trend-cycle component

STL

stl()
decompose()

plot(decompose(hsales))
plot(stl(hsales,s.window="periodic"))
plot(stl(hsales,s.window=15))

fit = stl(data, s.window=5, t.window=15, s.window = "periodic")
plot(fit)

t.window: controls wiggliness of trend
s.window: controls variation in seasonal component

Seasonal adjustment

plot(hsales,col="gray")
fit <- stl(hsales,s.window=15)
hsales.sa <- seasadj(fit)
lines(hsales.sa, col="red")

fit <- stl(elecequip, t.window=15,
s.window="periodic", robust=TRUE)
eeadj <- seasadj(fit)
plot(naive(eeadj), xlab="New orders index")
fcast <- forecast(fit, method="naive")
plot(fcast, ylab="New orders index")

Exponential smoothing

ses(x) : simple exponential smoothing, no trend
holt(x): Hold's method: linear trend
holt(x, exponential = TRUE):  exponential trend method
holt(x, damped = TRUE): damped trend method
holt(x, damped = TRUE, exponential = TRUE)

simple exponential smoothing

random walk forecasts

\[y\_{T+1|T} = y\_T\]

average

\[y\_{T+1|T} = 1/T * \sum\_{t=1}^T y\_t\]

simple exponential smoothing

\[y\_{T+1|T} = \alpha y\_T + \alpha(1-\alpha) y\_{T-1} + \alpha(1-\alpha)^2 y\_{T-2} +\]

weighted average form

\[y\_{T+1|T} = \alpha y\_T + \alpha(1-\alpha) y\_{T-1}\]

Exponentially weighted average:

\[y\_{T+1|T} = \sum\_{j=0}^{T-1} \alpha(1-\alpha)^j y\_{T-j} + (1-\alpha)^T l\_0\]

Common to set $l_0 = y_1$

optimization

can choose $\alpha$ and $l_0$ by minimizing MSE:

\[MSE = 1/(T-1) \sum\_{t=2}^T (y\_t- y\_{t|t-1})^2\]

No closed form, use numerical result

SES in R

fit1 <- ses(oildata, alpha=0.2,
        initial="simple", h=3)
fit2 <- ses(oildata, alpha=0.6,
initial="simple", h=3)
fit3 <- ses(oildata, h=3)
accuracy(fit1)
accuracy(fit2)
accuracy(fit3)

Non-seasonal trend methods

Holt’s local trend method

\begin{eqnarray} yhat_{t+h|t}& = l_t + h*b_t \
l_t &= \alpha*y_t + (1 - \alpha)(l_{t-1} + b_{t-1})\
b_t &= \beta^* (l_t - l_{t-1}) + (1 - \beta^*) b_{t-1} \end{eqnarray}

find $\alpha$ and $\beta^*$ by minimizing the MSE

fit1 <- holt(strikes)
plot(fit1$model)
plot(fit1, plot.conf=FALSE)
lines(fitted(fit1), col="red")
fit1$model
fit2 <- ses(strikes)
plot(fit2$model)
plot(fit2, plot.conf=FALSE)
lines(fit1$mean, col="red")
accuracy(fit1)
accuracy(fit2)

Exponential one

\begin{eqnarray} y_{t+h|t| = l_t + h* b_t \
l_t = \alpha y_t + (1 − \alpha )(l_t−1b_{t−1})\
b_t =\beta ∗ (l_t/l_{t−1})+(1−\beta∗)b_{t−1} \end{eqnarray}

bt : relative growth

damped trend

\begin{eqnarray} yhat_{t+h|t} &= l_t + (\phi + \phi^2 + … + \phi^(h-1))*b_t \
l_t &= \alpha*y_t + (1 - \alpha)(l_{t-1} + \phi* b_{t-1}) \
b_t &= \beta^* (l_t - l_{t-1}) + (1 - \beta^{*}) \phi* b_{t-1} \end{eqnarray}

phi damped the trend so it approaches a constant

Damped trend method often gives better forecasts than linear trend.

Holt-Winters

  • capture seasonality

additive method

\begin{eqnarray} y(t+h|t) &=& l_t + hb_t + s_{t−m+h^+m} \
l_t &=& \alpha (y_t − s_{t−m}) + (1 − \alpha )(l_{t−1} + b_{t−1}) \
b_t &=& \beta∗(l_t − l_{t−1}) + (1 − \beta∗)b_{t−1} \
s_t &=& γ(yt −l_{t−1} −b_{t−1})+(1−γ)s_{t−m} \
h_m^+ &=& ⌊(h−1) mod m⌋+1 \end{eqnarray}

Multiplicative model

\begin{eqnarray} y(t+h|t) &=& (l_t + hb_t)s_{t−m+h^+m} \
l_t &=& \alpha (y_t/s_{t−m}) + (1 − \alpha )(l_{t−1} + b_{t−1}) \
b_t &=& \beta∗(l_t − l_{t−1}) + (1 − \beta∗)b_{t−1} \
s_t &=& γ(yt/(l_{t−1}+b_{t−1}))+(1−γ)s_{t−m} \
h_m^+ &=& ⌊(h−1) mod m⌋+1 \end{eqnarray}

optimze

alpha, beta*, gamma, l_0, b_0, s_0, s_-1, ..., s_{1-m}

Damped Holt-winters Multiplicative method

\begin{eqnarray} y(t+h|t) &=& (l_t + (1+phi+…+phi^{h-1})b_t)s_{t−m+h^+m} \
l_t &=& \alpha (y_t/s_{t−m}) + (1 − \alpha )(l_{t−1} + b_{t−1}) \
b_t &=& \beta∗(l_t − l_{t−1}) + (1 − \beta∗)b_{t−1} \
s_t &=& γ(yt/(l_{t−1}+b_{t−1}))+(1−γ)s_{t−m} \
h_m^+ &=& ⌊(h−1) mod m⌋+1 \end{eqnarray}

ETS(Error, Trend, Seasonal)

Trend N A M
N N,N N,A N,M
A(additive) A,N A,A A,M
Ad(additive damped) Ad,N Ad,A Ad,M
M(multiplicative) M,N M,A M,M
Md(multiplicative damped) Md,N Md,A Md,M

R functions

ses(): (N,N)
holt(): (A, N), (Ad, N), (M,N), (Md,N)
hw(): (A,A), (Ad, A), (A,M), (Ad,M), (M,M), (Md,M)

Exponential smoothing state space models (forecast package)

Previous models do not use the likelihood.

State space models adopt distributional assumptions and can be estimated by MLE. Can alos do model comparison by AIC.

Showed All ES methods (including non-linear methods) are optimal forecasts from innovations state space models. See Hyndman et al (2008) for a comprehensive survey

ETS(A,N,N): simple exponential smoothing with additive errors
ETS(A,A,N): Holt's linear method with additive errors
ETS(A,A,A): additive Holt-winters method with additive errors
ETS(M,A,M): Multiplicative Holt-winters' method with multiplicative errors
ETS(A,Ad,N): damped trend method with additive errors

Innovative state space models

  • All the methods can be written in this state space form.
  • Prediction intervals can be obtained by simulating many future sample paths.
  • For many models, the prediction intervals can be obtained analytically as well.
  • Additive and multiplicative versions give the same point forecasts.
  • Estimation is handled via maximizing the likelihood of the data given the model.

AIC

AIC = -2 log(L) + 2p

AIC corrected( for small sample bias)

AIC = AIC + 2(p+1)(p+2)/(n-p)
  • A difference in AIC values of 2 or less is not regarded as substantial and you may choose the simpler but non-optimal model.

BIC

BIC = -2 log(L) + p*log(n)

Procedures

From Hyndman et al (2008):

  • Apply each of 30 methods that are appropriate to the data. Estimate parameters and initial values using MLE.
  • Select best method using AIC. Produce forecasts using best method.
  • Obtain prediction intervals using underlying state space model.

Sample R code:

fit <- ets(ausbeer)
fit2 <- ets(ausbeer,model="AAA",damped=FALSE)
fcast1 <- forecast(fit, h=20)
fcast2 <- forecast(fit2, h=20)
ets(y, model="ZZZ", damped=NULL, alpha=NULL,
beta=NULL, gamma=NULL, phi=NULL,
additive.only=FALSE,
lower=c(rep(0.0001,3),0.80),
upper=c(rep(0.9999,3),0.98),
opt.crit=c("lik","amse","mse","sigma"), nmse=3,
bounds=c("both","usual","admissible"),
ic=c("aic","aicc","bic"), restrict=TRUE)

ets() function:

  • Automatically chooses a model by default using the AIC, AICc or BIC.
  • Can handle any combination of trend, seasonality and damping
  • Produces prediction intervals for every model
  • Ensures the parameters are admissible (equivalent to invertible)
  • Produces an object of class ets.

functions

plot(fit)
accuracy(fit)
  • forecast returns forecasts when applied to an ets object (or the output from many other time series models).
  • If you use forecast directly on data, it will select an ETS model automatically and then return forecasts.

Basic Steps in a forecasting task

https://www.otexts.org/fpp/1/6

  1. preliminary analysis: always start by graphing the data, see chapter 2 and 6
    • consistent pattern ?
    • trend?
    • seasonality?
    • business cycles?
    • outliers?
  2. choose and fit models
    • regression model (chapter 4 and 5)

    • exponential smoothing model (chapter 7)
    • box-jenkins ARIMA model (chapter 8)
    • others like dynamic regression models, neural networks (chapter 9)
  3. use and evaluate model

Plot

  • time plots
    • trend
    • seasonal fixed and known length
    • cycle (chapter 6) unknown length, usually longer than that of seasonality

      first to identify the time series patterns in the data, then choose a method that is able to capture the patterns properly

      seasonplot
      monthplot
      
  • scatterplot

Statistics

  • ACF: check r4: seasonal pattern

    we expect 95% of the spikes in the ACF to lie within ±2/sqrt(T) where T is the length of the time series. It is common to plot these bounds on a graph of the ACF. If there are one or more large spikes outside these bounds, or if more than 5% of spikes are outside these bounds, then the series is probably not white noise.

forecast package

meanf()
naive(), snaive()
rwf()
croston()
stlf()
ses()
holt(), hw()
splinef()
thetaf()
forecast()

Regression

fit.ex4 <- tslm(austa ~ trend)
f <- forecast(fit.ex4, h=5,level=c(80,95))
plot(f, ylab="International tourist arrivals to Australia (millions)", xlab="t")
lines(fitted(fit.ex4),col="blue")
summary(fit.ex4)
  • outlier: use a dummy variable to remove the effect rather than omit it
  • public holidays: dummy variable
  • Easter: dummy variable for that month
  • Ex post and ex ante forecasting
  • intervention variables: spike variable or change of slope
  • selecting predictors:
  • not recommended to remove variables not showing significant in scatterplot
  • not valid to disregard all variables whose p value greater than 0.05

      CV(fit)
    
  • scatterplots of residuals against predictors

    if show a pattern, then non linear form may be needed

  • Scatterplot of residuals vs fitted value

    If showing a pattern, there may be heteroscedasticity. Transformation of forecast variable may be needed.

  • test autocorrelation, should be arround 2

      dwtest(fit, alt = "two.sided")
      bgtest(fit, 5) # test for autocorrelations up to lag 5
    

forecast() function

  • Takes a time series or time series model as its main argument
  • If first argument is class ts, returns forecasts from automatic ETS algorithm.
  • Methods for objects of class Arima, ar, HoltWinters, StructTS, etc.
  • Output as class forecast.

      forecast(y)
    

Confouding

Confounding is not really a problem for forecasting, as we can still compute forecasts without needing to separate out the effects of the predictors. However, it becomes a problem with scenario forecasting as the scenarios should take account of the relationships between predictors. It is also a problem if some historical analysis of the contributions of various predictors is required

Multicollinearity

  1. perfect correlation: not possible to estimate the model
  2. high correlation: difficult in estimation
  3. if uncertainty is large, statistical tests (e.g., t-test) are unreliable, also not posible to make accurate statement about the contribution of each separate predictor to the forecast

Transformation

If the data show different variation at different levels of the series, then a transformation can be useful.

Logarithms are useful

Box-Cox transformation

  • if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values.
  • If some yt=0, then must have λ>0

  • A Box-Cox transformation followed by an additive ETS model is often better than an ETS model without transformation.
  • A Box-Cox transformation followed by STL + ETS is often better than an ETS model without transformation.
  • It makes no sense to use a Box-Cox transformation and a non-additive ETS model.

      lambda = BoxCox.lambda(y) : can choose a value for lambda
      plot(BoxCox(y, lambda))
    

Stationarity

  • for all s, the distribution of (Yt, …, Y(t+s)) does not depend on t
  • roughly horizontal
  • constant variance
  • no patterns predictable in the long-term

to identify non-stationary series

  • time plot
  • acf drop to zero quite quickly
  • acf of non-stationary decreases slowly
  • r1 is often large and positve for non-stationary data

Random walk model

\[y\_t = y\_{t-1} + e\_t\]

Random walk with drift model

\[y\_t = y\_{t-1} + e\_t + c\]

c is the average change between consecutive observations

  • In practice, it is almost never necessary to go beyond second-order differences.

Seasonal differencing

\[y\_{t}^{\prime} = y\_{t} - y\_{t-m}\]

For monthly data, m= 12; for quarterly, m = 4.

plot(diff(data, 12))

automated differencing

ndiffs(x)
nsdiffs(x)

Non-seasonal ARIMA models

Autoregressive model (AR)

\[y\_{t} = \phi\_1 y\_{t-1} + ... + \phi\_p y\_{t-p} + \epsilon\_{t}\]

A multiple regression with lagged values

AR(1)

\[y\_{t} = c + \phi\_1 y\_{t-1} + \epsilon\_{t}\]
  • when \phi_1 = 0, equivalent to white noise
  • when \phi = 1, and c = 0, equivalent to a random walk
  • when \phi = 1, and c \neq 0, equivalent to a random walk with drift
  • when \phi < 0, oscillate between positive and negative values

Moving Average (MA) models

\[y\_{t} = c + \epsilon\_{t} + \theta\_1 \epsilon\_{t-1} + ... + \theta\_q \epsilon\_{t-q}\]

ARIMA models

\begin{eqnarray} y_{t} & = c + \phi_1 y_{t-1} + … + \phi_p y_{t-p} \
& + \theta_1 \epsilon_{t-1} + … + \theta_q \epsilon_{t-q} \end{eqnarray}

  • include lagged values and lagged errors
  • used for a huge range or stationary time series
  • short-term dynamics

ARIMA(p, d, q)

AR: p = order of the autoregressive part
I:  d = degree of first differencing involved
MA: q = order of the moving average part
  • White noise: ARIMA(0,0,0)
  • RW: ARIMA(0,1,0) with no constant
  • RW with drift : ARIMA(0,1,0) with constant
  • AR(p): ARIMA(p,0,0)
  • MA(q): ARIMA(0,0,q)

Code

> fit <- auto.arima(usconsumption[,1], seasonal=FALSE)
ARIMA(0,0,3) with non-zero mean
Coefficients:
ma1     ma2     ma3  intercept
0.2542  0.2260  0.2695     0.7562
s.e.  0.0767  0.0779  0.0692     0.0844
sigma^2 estimated as 0.3856:  log likelihood=-154.73
AIC=319.46   AICc=319.84   BIC=334.96

yt = 0.756 + et + 0.254et−1 + 0.226et−2 + 0.269et−3,

MLE

  • Estimate $c, \phi_1, …, \phi_p, \theta_1, …, \theta_q$
  • MLE is similar to least squares
  • non-linear optimization
  • different software will give different estimates

Step

Hyndman and Khandakar (JSS, 2008):

  • choose d via unit root tests
  • choose p, q by minimising AIC
  • stepwise search to traverse model space

Step 1: Select current model (with smallest AIC) from: ARIMA(2, d, 2)

ARIMA(0, d, 0)
ARIMA(1, d, 0)
ARIMA(0, d, 1)

Consider variations of current model:

  • vary one of p, q, from current model by ±1 • p, q both vary from current model by ±1
  • Include/exclude c from current model

Model with lowest AICc becomes current model.

Step 2:

Repeat Step 2 until no lower AICc can be found.

Prediction intervals

  • Prediction intervals increase in size with forecast horizon.
  • Prediction intervals can be difficult to calculate by hand
  • Calculations assume residuals are uncorrelated and normally distributed. Prediction intervals tend to be too narrow.
  • the uncertainty in the parameter estimates has not been accounted for.
  • the ARIMA model assumes historical patterns will not change during the forecast period.
  • the ARIMA model assumes uncorrelated future errors

Backshift notation

  • First difference: 1 - B
  • Double difference: (1-B)^2
  • dth-order difference: (1-B)^d yt
  • seasonal difference: 1 - B^m

ARIMA model

ARIMA(1,1,1):

(1-phi_1 B)(1-B) y_t = c + (1 + theta_1 B) e_t
AR(1)      first            MA(1)
		   difference

Seasonal ARIMA models:

ARIMA(p,d,q)(P,D,Q)m
	  non    seasonal
	  seasonal
m: number of periods per season

> auto.arima(euretail)
ARIMA(1,1,1)(0,1,1)[4]
Coefficients:
ar1      ma1     sma1
0.8828  -0.5208  -0.9704
s.e.  0.1424   0.1755   0.6792
sigma^2 estimated as 0.1411:  log likelihood=-30.19
AIC=68.37   AICc=69.11   BIC=76.68

auto.arima(euretail, stepwise=FALSE,
approximation=FALSE)

fit <- auto.arima(h02, lambda=0)
> fit
ARIMA(2,1,3)(0,1,1)[12]
Box Cox transformation: lambda= 0
Coefficients:
ar1      ar2     ma1     ma2      ma3     sma1
-1.0194  -0.8351  0.1717  0.2578  -0.4206  -0.6528
s.e.   0.1648   0.1203  0.2079  0.1177   0.1060   0.0657
sigma^2 estimated as 0.004071:  log likelihood=250.8
AIC=-487.6   AICc=-486.99   BIC=-464.83
  • Myth that ARIMA models are more general than exponential smoothing.
  • Linear exponential smoothing models all special cases of ARIMA models.
  • Non-linear exponential smoothing models have no equivalent ARIMA counterparts.
  • Many ARIMA models have no exponential smoothing counterparts.
  • ETS models all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit root.

Equivalences

  • Simple exponential smoothing
    • Forecasts equivalent to ARIMA(0,1,1).
    • Parameters: θ1 = \alpha − 1.
  • Holt’s method
    • Forecasts equivalent to ARIMA(0,2,2).
    • Parameters: θ1 =\alpha +\beta−2andθ2 =1−\alpha .
  • Damped Holt’s method
    • Forecasts equivalent to ARIMA(1,1,2).
      • Parameters: \phi 1 =\phi ,θ1 =\alpha +\phi \beta −2,θ2 =(1−\alpha )\phi .
  • Holt-Winters’ additive method
    • Forecasts equivalent to ARIMA(0,1,m+1)(0,1,0)m.
    • Parameter restrictions because ARIMA has m + 1 parameters whereas HW uses only three parameters.
  • Holt-Winters’ multiplicative method
    • No ARIMA equivalence

Dynamic Regression

Fourier

fit <- auto.arima(y, seasonal=FALSE,
xreg=fourier(y, K=3))

Forecast a regression model with ARIMA errors

  1. forecast the regression part
  2. forecast the ARIMA part
  3. combine
  • when future predictors are unknown, they need to be forecast first
  • forecasts of macroeconomic variables may be obtained from the national statistical officies
  • separate forecasting models may be needed for other explanatory variables
  • some explanatory variables are known for the future (e.e., time, dummies)

Cross-validation

  • does not work normally for time series because we cannot use future observations to build a model

Example

  • Linear model with trend and seasonal dummies applied to log data
  • ARIMA model applied to log data
  • ETS model applied to original data

which model is best

  • set k=48 as minimum training set
  • forecast 12 steps ahead
  • compare MAE for each forecast horizon

Code:

k <- 48
n <- length(a10)
mae1 <- mae2 <- mae3 <- matrix(NA,n-k-12,12)
for(i in 1:(n-k-12))
{
xshort <- window(a10,end=1995+(5+i)/12)
xnext <- window(a10,start=1995+(6+i)/12,end=1996+(5+i)/12)
fit1 <- tslm(xshort ~ trend + season, lambda=0)
fcast1 <- forecast(fit1,h=12)
fit2 <- auto.arima(xshort,D=1, lambda=0)
fcast2 <- forecast(fit2,h=12)
fit3 <- ets(xshort)
fcast3 <- forecast(fit3,h=12)
mae1[i,] <- abs(fcast1[[’mean’]]-xnext)
mae2[i,] <- abs(fcast2[[’mean’]]-xnext)
mae3[i,] <- abs(fcast3[[’mean’]]-xnext)
}
plot(1:12,colMeans(mae1),type="l",col=2,xlab="horizon",ylab="MAE",
ylim=c(0.58,1.0))
lines(1:12,colMeans(mae2),type="l",col=3)
lines(1:12,colMeans(mae3),type="l",col=4)
legend("topleft",legend=c("LM","ARIMA","ETS"),col=2:4,lty=1)

Keep training window of fixed length:

xshort <- window(a10,start=i+1/12,end=1995+(5+i)/12)

Compute one-step forecasts in out-of-sample period:

for(i in 1:(n-k))
{
xshort <- window(a10,end=1995+(5+i)/12)
xlong <- window(a10,start=1995+(6+i)/12)
fit2 <- auto.arima(xshort,D=1, lambda=0)
fit2a <- Arima(xlong,model=fit2)
fit3 <- ets(xshort)
fit3a <- ets(xlong,model=fit3)
mae2a[i,] <- abs(residuals(fit3a))
mae3a[i,] <- abs(residuals(fit2a))
}

TBATS model

  • Trigonometric terms for seasonality
  • Box-Cox transformations for heterogeneity
  • ARMA errors for short-term dynamics
  • Trend (possibly damped)
  • Seasonal (including multiple and non-integer periods)

check Dropbox/Documents/forecast/12-Cross-validation.pdf for more detail



Published

27 January 2014

Modified

liuminzhao 02/17/2014 09:20:25

Tags