在许多情况下,仅提供层级结构最低级别的时间序列(底部时间序列)。HierarchicalForecast 提供了工具来创建所有层级的时间序列,并允许您计算所有层级的预测区间。在本 Notebook 中,我们将了解如何实现。

!pip install hierarchicalforecast statsforecast
import pandas as pd

# compute base forecast no coherent
from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.utils import aggregate, HierarchicalPlot
from hierarchicalforecast.core import HierarchicalReconciliation

聚合底部时间序列

在此示例中,我们将使用《预测:原则与实践》一书中的 Tourism 数据集。该数据集仅包含最低级别的时间序列,因此我们需要为所有层级创建时间序列。

Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.PeriodIndex(Y_df["ds"], freq='Q').to_timestamp()
Y_df.head()
国家区域目的dsy
0AustraliaAdelaideSouth Australia商务1998-01-01135.077690
1AustraliaAdelaideSouth Australia商务1998-04-01109.987316
2AustraliaAdelaideSouth Australia商务1998-07-01166.034687
3AustraliaAdelaideSouth Australia商务1998-10-01127.160464
4AustraliaAdelaideSouth Australia商务1999-01-01137.448533

该数据集可以按以下严格的分层结构进行分组。

spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'State', 'Region']
]

使用 aggregate 函数从 HierarchicalForecast 中我们可以获取完整的时间序列集。

Y_df, S_df, tags = aggregate(df=Y_df, spec=spec)
Y_df.head()
unique_iddsy
0Australia1998-01-0123182.197269
1Australia1998-04-0120323.380067
2Australia1998-07-0119826.640511
3Australia1998-10-0120830.129891
4Australia1999-01-0122087.353380
S_df.iloc[:5, :5]
unique_idAustralia/ACT/CanberraAustralia/New South Wales/Blue MountainsAustralia/New South Wales/Capital CountryAustralia/New South Wales/Central Coast
0Australia1.01.01.01.0
1Australia/ACT1.00.00.00.0
2Australia/New South Wales0.01.01.01.0
3Australia/Northern Territory0.00.00.00.0
4Australia/Queensland0.00.00.00.0
tags['Country/State']
array(['Australia/ACT', 'Australia/New South Wales',
       'Australia/Northern Territory', 'Australia/Queensland',
       'Australia/South Australia', 'Australia/Tasmania',
       'Australia/Victoria', 'Australia/Western Australia'], dtype=object)

我们可以使用 HierarchicalPlot 类按如下方式可视化 S 矩阵和数据。

hplot = HierarchicalPlot(S=S_df, tags=tags)
hplot.plot_summing_matrix()

hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra',
    Y_df=Y_df
)

划分训练/测试集

我们使用最后两年(8个季度)作为测试集。

Y_test_df = Y_df.groupby('unique_id', as_index=False).tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_train_df.groupby('unique_id').size()
unique_id
Australia                                                 72
Australia/ACT                                             72
Australia/ACT/Canberra                                    72
Australia/New South Wales                                 72
Australia/New South Wales/Blue Mountains                  72
                                                          ..
Australia/Western Australia/Australia's Coral Coast       72
Australia/Western Australia/Australia's Golden Outback    72
Australia/Western Australia/Australia's North West        72
Australia/Western Australia/Australia's South West        72
Australia/Western Australia/Experience Perth              72
Length: 85, dtype: int64

计算基本预测

以下单元格使用 AutoARIMA 模型计算 Y_df 中每个时间序列的**基本预测**。请注意,Y_hat_df 包含预测结果,但它们不具有一致性。为了协调预测区间,我们需要使用 StatsForecastlevel 参数计算不一致的区间。

fcst = StatsForecast(models=[AutoARIMA(season_length=4)], 
                     freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(df=Y_train_df, h=8, fitted=True, level=[80, 90])
Y_fitted_df = fcst.forecast_fitted_values()

使用 PERMBU 协调预测并计算预测区间

以下单元格使用 HierarchicalReconciliation 类使之前的预测具有一致性。在此示例中,我们使用 BottomUpMinTrace。如果要计算预测区间,必须如下所示使用 level 参数,并且还要使用 intervals_method='permbu'

reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_fitted_df,
                          S=S_df, tags=tags,
                          level=[80, 90], intervals_method='permbu')

数据框 Y_rec_df 包含协调后的预测结果。

Y_rec_df.head()
unique_iddsAutoARIMAAutoARIMA-lo-90AutoARIMA-lo-80AutoARIMA-hi-80AutoARIMA-hi-90AutoARIMA/BottomUpAutoARIMA/BottomUp-lo-90AutoARIMA/BottomUp-lo-80AutoARIMA/MinTrace_method-mint_shrinkAutoARIMA/MinTrace_method-mint_shrink-lo-90AutoARIMA/MinTrace_method-mint_shrink-lo-80AutoARIMA/MinTrace_method-mint_shrink-hi-80AutoARIMA/MinTrace_method-mint_shrink-hi-90AutoARIMA/MinTrace_method-olsAutoARIMA/MinTrace_method-ols-lo-90AutoARIMA/MinTrace_method-ols-lo-80AutoARIMA/MinTrace_method-ols-hi-80AutoARIMA/MinTrace_method-ols-hi-90
0Australia2016-01-0126212.55355324705.94818025038.71507727386.39202927719.15892724955.50157124143.05613124387.23020025413.65760624705.68271024905.67777225928.33436726050.23296126142.81801625525.08172125656.53799526606.34503226832.423921
1Australia2016-04-0125033.66712523337.26758823711.95469626355.37955426730.06666223421.31286822762.04524722904.08719724058.90641123486.82854823627.15262324659.40548424847.77850324946.33864924297.06123024434.80504825535.54904025640.659918
2Australia2016-07-0124507.02719822640.02879823052.39641325961.65798326374.02559922807.70682622065.40237322223.12040423438.86389322672.65870122888.29915323971.72473324179.54867724407.24500323712.84179723834.05432725027.07361525189.869286
3Australia2016-10-0125598.92861323575.66524324022.54741027175.30981627622.19198323471.84587022677.59357522892.32893924322.04939823619.41971223682.80374624847.29922825028.34557225496.85560424740.21046524923.56078326094.25041426273.617732
4Australia2017-01-0126982.57679624669.53523825180.42128528784.73230829295.61835424668.73593123760.84207223964.28312425520.16354924720.30439224910.10665026170.55267826347.18190326853.23190726045.21367726149.75337427502.49967427733.985566

绘制预测结果

然后我们可以使用以下函数绘制概率预测结果。

plot_df = Y_df.merge(Y_rec_df, on=['unique_id', 'ds'], how="outer")

绘制单个时间序列

hplot.plot_series(
    series='Australia',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 
            'AutoARIMA/MinTrace_method-ols',
            'AutoARIMA/BottomUp'
           ],
    level=[80]
)

绘制分层关联的时间序列

hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/Western Australia/Experience Perth',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 'AutoARIMA/MinTrace_method-ols', 'AutoARIMA/BottomUp'],
    level=[80]
)

# ACT only has Canberra
hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA/MinTrace_method-mint_shrink'],
    level=[80, 90]
)

参考文献