正态性

在许多情况下，只有层级结构最低层的时间序列（底部时间序列）可用。HierarchicalForecast 提供了工具来创建所有层级结构的时间序列，并允许您计算所有层级结构的预测区间。在本 Notebook 中，我们将看到如何实现这一点。

!pip install hierarchicalforecast statsforecast

import pandas as pd

# compute base forecast no coherent
from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.utils import aggregate, HierarchicalPlot
from hierarchicalforecast.core import HierarchicalReconciliation

聚合底部时间序列

在此示例中，我们将使用来自《预测：原理与实践》一书中的 Tourism 数据集。该数据集仅包含最低层级的时间序列，因此我们需要为所有层级创建时间序列。

Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.PeriodIndex(Y_df["ds"], freq='Q').to_timestamp()
Y_df.head()

	国家	地区	州	目的	ds	y
0	澳大利亚	阿德莱德	南澳大利亚	商务	1998-01-01	135.077690
1	澳大利亚	阿德莱德	南澳大利亚	商务	1998-04-01	109.987316
2	澳大利亚	阿德莱德	南澳大利亚	商务	1998-07-01	166.034687
3	澳大利亚	阿德莱德	南澳大利亚	商务	1998-10-01	127.160464
4	澳大利亚	阿德莱德	南澳大利亚	商务	1999-01-01	137.448533

该数据集可以按以下非严格层级结构进行分组。

spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
    ['Country', 'State', 'Region', 'Purpose']
]

使用 HierarchicalForecast 中的 aggregate 函数，我们可以生成： 1. Y_df: 层级结构的时间序列 $\mathbf{y}_{[a,b]\tau}$ 2. S_df: 包含聚合约束 $S_{[a,b]}$ 的 DataFrame 3. tags: 一个列表，包含构成每个聚合层级的 ‘unique_ids’。

Y_df, S_df, tags = aggregate(df=Y_df, spec=spec)

Y_df.head()

	unique_id	ds	y
0	澳大利亚	1998-01-01	23182.197269
1	澳大利亚	1998-04-01	20323.380067
2	澳大利亚	1998-07-01	19826.640511
3	澳大利亚	1998-10-01	20830.129891
4	澳大利亚	1999-01-01	22087.353380

S_df.iloc[:5, :5]

	unique_id	澳大利亚/ACT/堪培拉/商务	澳大利亚/ACT/堪培拉/假日	澳大利亚/ACT/堪培拉/其他	澳大利亚/ACT/堪培拉/探访
0	澳大利亚	1.0	1.0	1.0	1.0
1	澳大利亚/ACT	1.0	1.0	1.0	1.0
2	澳大利亚/新南威尔士	0.0	0.0	0.0	0.0
3	澳大利亚/北领地	0.0	0.0	0.0	0.0
4	澳大利亚/昆士兰	0.0	0.0	0.0	0.0

tags['Country/Purpose']

array(['Australia/Business', 'Australia/Holiday', 'Australia/Other',
       'Australia/Visiting'], dtype=object)

我们可以使用 HierarchicalPlot 类按如下方式可视化 S 矩阵和数据。

hplot = HierarchicalPlot(S=S_df, tags=tags)

hplot.plot_summing_matrix()

hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra/Holiday',
    Y_df=Y_df
)

划分训练/测试集

我们使用最后两年（8个季度）作为测试集。

Y_test_df = Y_df.groupby('unique_id', as_index=False).tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)

Y_train_df.groupby('unique_id').size()

unique_id
Australia                                                72
Australia/ACT                                            72
Australia/ACT/Business                                   72
Australia/ACT/Canberra                                   72
Australia/ACT/Canberra/Business                          72
                                                         ..
Australia/Western Australia/Experience Perth/Other       72
Australia/Western Australia/Experience Perth/Visiting    72
Australia/Western Australia/Holiday                      72
Australia/Western Australia/Other                        72
Australia/Western Australia/Visiting                     72
Length: 425, dtype: int64

计算基础预测

以下单元格使用 AutoARIMA 模型计算 Y_df 中每个时间序列的基础预测。请注意，Y_hat_df 包含预测结果，但它们不一致。为了协调预测区间，我们需要使用 StatsForecast 的 level 参数计算不一致的区间。

fcst = StatsForecast(models=[AutoARIMA(season_length=4)], 
                     freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(df=Y_train_df, h=8, fitted=True, level=[80, 90])
Y_fitted_df = fcst.forecast_fitted_values()

协调预测

以下单元格使用 HierarchicalReconciliation 类使先前的预测结果一致。由于层级结构不严格，我们无法使用诸如 TopDown 或 MiddleOut 等方法。在此示例中，我们使用 BottomUp 和 MinTrace。如果要计算预测区间，必须如下使用 level 参数。

reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_fitted_df, 
                          S=S_df, tags=tags, level=[80, 90])

DataFrame Y_rec_df 包含协调后的预测结果。

Y_rec_df.head()

	unique_id	ds	AutoARIMA	AutoARIMA-lo-90	AutoARIMA-lo-80	AutoARIMA-hi-80	AutoARIMA-hi-90	AutoARIMA/BottomUp	AutoARIMA/BottomUp-lo-90	AutoARIMA/BottomUp-lo-80	…	AutoARIMA/MinTrace_method-mint_shrink	AutoARIMA/MinTrace_method-mint_shrink-lo-90	AutoARIMA/MinTrace_method-mint_shrink-lo-80	AutoARIMA/MinTrace_method-mint_shrink-hi-80	AutoARIMA/MinTrace_method-mint_shrink-hi-90	AutoARIMA/MinTrace_method-ols	AutoARIMA/MinTrace_method-ols-lo-90	AutoARIMA/MinTrace_method-ols-lo-80	AutoARIMA/MinTrace_method-ols-hi-80	AutoARIMA/MinTrace_method-ols-hi-90
0	澳大利亚	2016-01-01	26212.553553	24705.948180	25038.715077	27386.392029	27719.158927	24646.517084	23983.656843	24130.064091	…	25267.797338	24491.630618	24663.064091	25872.530586	26043.964058	26082.753488	25010.876141	25247.623803	26917.883174	27154.630835
1	澳大利亚	2016-04-01	25033.667125	23337.267588	23711.954696	26355.379554	26730.066662	22942.957703	22229.916838	22387.407579	…	23836.804444	23002.620214	23186.868128	24486.740760	24670.988674	24822.102094	23616.734393	23882.966332	25761.237857	26027.469796
2	澳大利亚	2016-07-01	24507.027198	22640.028798	23052.396413	25961.657983	26374.025599	22568.286488	21805.892199	21974.283728	…	23294.240908	22410.719833	22605.864873	23982.616942	24177.761983	24269.578724	22944.380043	23237.079287	25302.078162	25594.777406
3	澳大利亚	2016-10-01	25598.928613	23575.665243	24022.547410	27175.309816	27622.191983	23113.075726	22308.671860	22486.342127	…	24154.484487	23221.706185	23427.730766	24881.238208	25087.262790	25340.549923	23905.434070	24222.410936	26458.688911	26775.665777
4	澳大利亚	2017-01-01	26982.576796	24669.535238	25180.421285	28784.732308	29295.618354	23779.264921	22874.194227	23074.098975	…	25155.001372	24125.268915	24352.707952	25957.294793	26184.733830	26690.200927	25051.352698	25413.328335	27967.073518	28329.049155

绘制预测结果

然后我们可以使用以下函数绘制概率预测结果。

plot_df = Y_df.merge(Y_rec_df, on=['unique_id', 'ds'], how="outer")

绘制单个时间序列

hplot.plot_series(
    series='Australia',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 'AutoARIMA/MinTrace_method-ols'],
    level=[80]
)

# Since we are plotting a bottom time series
# the probabilistic and mean forecasts
# are the same
hplot.plot_series(
    series='Australia/Western Australia/Experience Perth/Visiting',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 'AutoARIMA/BottomUp'],
    level=[80]
)

绘制层级关联的时间序列

hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/Western Australia/Experience Perth/Visiting',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 'AutoARIMA/MinTrace_method-ols', 'AutoARIMA/BottomUp'],
    level=[80]
)

# ACT only has Canberra
hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra/Other',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA/MinTrace_method-mint_shrink'],
    level=[80, 90]
)

入门

教程

API 参考

聚合底部时间序列

划分训练/测试集

计算基础预测

协调预测

绘制预测结果

绘制单个时间序列

绘制层级关联的时间序列

参考文献

入门

教程

API 参考

​聚合底部时间序列

​划分训练/测试集

​计算基础预测

​协调预测

​绘制预测结果

​绘制单个时间序列

​绘制层级关联的时间序列

​参考文献

聚合底部时间序列

划分训练/测试集

计算基础预测

协调预测

绘制预测结果

绘制单个时间序列

绘制层级关联的时间序列

参考文献