Dates in timeseries models

In [1]:
from __future__ import print_function
import statsmodels.api as sm
import numpy as np
import pandas as pd
/usr/lib/python3/dist-packages/numpy/core/getlimits.py:214: RuntimeWarning: overflow encountered in nextafter
  if hasattr(umath, 'nextafter')  # Missing on some platforms?
/ws/builds/jenkins/ws/du3/components/statsmodels/build/statsmodels-0.8.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

Getting started

In [2]:
data = sm.datasets.sunspots.load()

Right now an annual date series must be datetimes at the end of the year.

In [3]:
from datetime import datetime
dates = sm.tsa.datetools.dates_from_range('1700', length=len(data.endog))

Using Pandas

Make a pandas TimeSeries or DataFrame

In [4]:
endog = pd.Series(data.endog, index=dates)

Instantiate the model

In [5]:
ar_model = sm.tsa.AR(endog, freq='A')
pandas_ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)

Out-of-sample prediction

In [6]:
pred = pandas_ar_res.predict(start='2005', end='2015')
print(pred)
2005-12-31    20.003291
2006-12-31    24.703985
2007-12-31    20.026132
2008-12-31    23.473641
2009-12-31    30.858588
2010-12-31    61.335478
2011-12-31    87.024730
2012-12-31    91.321296
2013-12-31    79.921663
2014-12-31    60.799550
2015-12-31    40.374892
Freq: A-DEC, dtype: float64

Using explicit dates

In [7]:
ar_model = sm.tsa.AR(data.endog, dates=dates, freq='A')
ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)
pred = ar_res.predict(start='2005', end='2015')
print(pred)
[20.0032909  24.70398475 20.02613221 23.47364133 30.8585877  61.33547818
 87.02472963 91.32129641 79.92166302 60.79954991 40.37489151]

This just returns a regular array, but since the model has date information attached, you can get the prediction dates in a roundabout way.

In [8]:
print(ar_res.data.predict_dates)
DatetimeIndex(['2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31',
               '2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31',
               '2013-12-31', '2014-12-31', '2015-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

Note: This attribute only exists if predict has been called. It holds the dates associated with the last call to predict.