电力负荷预测

简介

有些时间序列数据是从非常低频的数据生成的。这些数据通常表现出多重季节性。例如，小时数据可能每小时（每 24 个观测值）或每天（每 24 * 7 小时，即每天的小时数，观测值）表现出重复模式。电力负荷就是这种情况。电力负荷可能每小时变化，例如，傍晚时分电力消耗可能预期会增加。电力负荷也会按周变化。也许周末的电力活动会有所增加。

在本示例中，我们将展示如何对时间序列的两种季节性进行建模，以便在短时间内生成准确的预测。我们将使用 PJM 的小时电力负荷数据。原始数据可在此处找到。

库

在本示例中，我们将使用以下库

StatsForecast。使用统计和计量经济学模型进行闪电 ⚡️ 快速预测。包含用于多重季节性的 MSTL 模型。
Prophet。由 Facebook 开发的基准模型。
NeuralProphet。Prophet 的深度学习版本。用作基准。

# !pip install statsforecast "neuralprophet[live]" prophet

使用多重季节性进行预测

电力负荷数据

根据数据集页面，

PJM 互联公司 (PJM) 是美国的一个区域输电组织 (RTO)。它是东部互联电网的一部分，运营着一个为特拉华州、伊利诺伊州、印第安纳州、肯塔基州、马里兰州、密歇根州、新泽西州、北卡罗来纳州、俄亥俄州、宾夕法尼亚州、田纳西州、弗吉尼亚州、西弗吉尼亚州以及哥伦比亚特区全部或部分地区提供服务的输电系统。小时电力消耗数据来自 PJM 网站，单位为兆瓦 (MW)。

让我们看看数据。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from utilsforecast.plotting import plot_series

pd.plotting.register_matplotlib_converters()
plt.rc("figure", figsize=(10, 8))
plt.rc("font", size=10)

df = pd.read_csv('https://raw.githubusercontent.com/panambY/Hourly_Energy_Consumption/master/data/PJM_Load_hourly.csv')
df.columns = ['ds', 'y']
df.insert(0, 'unique_id', 'PJM_Load_hourly')
df['ds'] = pd.to_datetime(df['ds'])
df = df.sort_values(['unique_id', 'ds']).reset_index(drop=True)
df.tail()

	unique_id	ds	y
32891	PJM_Load_hourly	2001-12-31 20:00:00	36392.0
32892	PJM_Load_hourly	2001-12-31 21:00:00	35082.0
32893	PJM_Load_hourly	2001-12-31 22:00:00	33890.0
32894	PJM_Load_hourly	2001-12-31 23:00:00	32590.0
32895	PJM_Load_hourly	2002-01-01 00:00:00	31569.0

plot_series(df)

我们清楚地观察到时间序列表现出季节性模式。此外，时间序列包含 32,896 个观测值，因此需要使用计算效率很高的方法在生产环境中显示它们。

MSTL 模型

的MSTL (使用 LOESS 的多重季节性趋势分解) 模型，最初由 Kasun Bandara、Rob J Hyndman 和 Christoph Bergmeir 开发，使用局部多项式回归 (LOESS) 将时间序列分解为多重季节性。然后使用自定义的非季节性模型预测趋势，并使用 SeasonalNaive 模型预测每个季节性。

StatsForecast 包含 MSTL 模型的快速实现。此外，还可以计算时间序列的分解。

from statsforecast import StatsForecast
from statsforecast.models import MSTL, AutoARIMA, SeasonalNaive
from statsforecast.utils import AirPassengers as ap

首先，我们必须定义模型参数。如前所述，电力负荷每 24 小时（小时）和每 24 * 7 小时（天）呈现季节性。因此，我们将使用 [24, 24 * 7] 作为 MSTL 模型接收的季节性。我们还必须指定预测趋势的方式。在本例中，我们将使用 AutoARIMA 模型。

mstl = MSTL(
    season_length=[24, 24 * 7], # seasonalities of the time series 
    trend_forecaster=AutoARIMA() # model used to forecast trend
)

模型实例化后，我们必须实例化 StatsForecast 类来创建预测。

sf = StatsForecast(
    models=[mstl], # model used to fit each time series 
    freq='h', # frequency of the data
)

拟合模型

之后，我们只需使用 fit 方法将每个模型拟合到每个时间序列。

sf = sf.fit(df=df)

将时间序列分解为多重季节性

模型拟合后，我们可以使用 StatsForecast 的 fitted_ 属性访问分解结果。此属性存储了每个时间序列的拟合模型的所有相关信息。

在本例中，我们正在为一个时间序列拟合一个模型，因此通过访问 fitted_ 位置 [0, 0]，我们将找到模型的相关信息。MSTL 类生成一个 model_ 属性，其中包含序列的分解方式。

sf.fitted_[0, 0].model_

	数据	趋势	季节性24	季节性168	残差
0	22259.0	25899.808157	-4720.213546	581.308595	498.096794
1	21244.0	25900.349395	-5433.168901	571.780657	205.038849
2	20651.0	25900.875973	-5829.135728	557.142643	22.117112
3	20421.0	25901.387631	-5704.092794	597.696957	-373.991794
4	20713.0	25901.884103	-5023.324375	922.564854	-1088.124582
…	…	…	…	…	…
32891	36392.0	33329.031577	4254.112720	917.258336	-2108.402633
32892	35082.0	33355.083576	3625.077164	721.689136	-2619.849876
32893	33890.0	33381.108409	2571.794472	549.661529	-2612.564409
32894	32590.0	33407.105839	796.356548	361.956280	-1975.418667
32895	31569.0	33433.075723	-1260.860917	279.777069	-882.991876

让我们通过图表查看时间序列的不同分量。

sf.fitted_[0, 0].model_.tail(24 * 28).plot(subplots=True, grid=True)
plt.tight_layout()
plt.show()

我们观察到有一个明显的上升趋势（橙色线）。这个分量将使用 AutoARIMA 模型进行预测。我们还可以观察到，每 24 小时和每 24 * 7 小时都有一个非常清晰的模式。这两个分量将分别使用 SeasonalNaive 模型进行预测。

生成预测

要生成预测，我们只需使用 predict 方法指定预测范围 (h)。此外，为了计算与预测相关的预测区间，我们可以包含参数 level，该参数接收我们想要构建的预测区间的级别列表。在本例中，我们仅计算 90% 的预测区间 (level=[90])。

forecasts = sf.predict(h=24, level=[90])
forecasts.head()

	unique_id	ds	MSTL	MSTL-lo-90	MSTL-hi-90
0	PJM_Load_hourly	2002-01-01 01:00:00	30215.608163	29842.185622	30589.030705
1	PJM_Load_hourly	2002-01-01 02:00:00	29447.209028	28787.123369	30107.294687
2	PJM_Load_hourly	2002-01-01 03:00:00	29132.787603	28221.354454	30044.220751
3	PJM_Load_hourly	2002-01-01 04:00:00	29126.254591	27992.821420	30259.687762
4	PJM_Load_hourly	2002-01-01 05:00:00	29604.608674	28273.428663	30935.788686

让我们通过图表查看我们的预测。

plot_series(df, forecasts, level=[90], max_insample_length=24*7)

在下一节中，我们将绘制不同的模型，因此使用以下函数重用之前的代码很方便。

def plot_forecasts(y_hist, y_true, y_pred, models):
    _, ax = plt.subplots(1, 1, figsize = (20, 7))
    y_true = y_true.merge(y_pred, how='left', on=['unique_id', 'ds'])
    df_plot = pd.concat([y_hist, y_true]).set_index('ds').tail(24 * 7)
    df_plot[['y'] + models].plot(ax=ax, linewidth=2)
    colors = ['orange', 'green', 'red']
    for model, color in zip(models, colors):
        ax.fill_between(df_plot.index, 
                        df_plot[f'{model}-lo-90'], 
                        df_plot[f'{model}-hi-90'],
                        alpha=.35,
                        color=color,
                        label=f'{model}-level-90')
    ax.set_title('PJM Load Hourly', fontsize=22)
    ax.set_ylabel('Electricity Load', fontsize=20)
    ax.set_xlabel('Timestamp [t]', fontsize=20)
    ax.legend(prop={'size': 15})
    ax.grid()

MSTL 模型的性能

划分训练/测试集

为了验证 MSTL 模型的准确性，我们将展示其在未见过数据上的性能。我们将使用一种经典的时间序列技术，即将数据划分为训练集和测试集。我们将最后 24 个观测值（最后一天）作为测试集。因此模型将在 32,872 个观测值上进行训练。

df_test = df.tail(24)
df_train = df.drop(df_test.index)

MSTL 模型

除了 MSTL 模型外，我们还将包含 SeasonalNaive 模型作为基准，以验证 MSTL 模型的附加价值。包含 StatsForecast 模型就像将它们添加到要拟合的模型列表中一样简单。

sf = StatsForecast(
    models=[mstl, SeasonalNaive(season_length=24)], # add SeasonalNaive model to the list
    freq='h'
)

为了测量拟合时间，我们将使用 time 模块。

from time import time

要检索测试集的预测结果，我们只需像之前一样进行拟合和预测即可。

init = time()
sf = sf.fit(df=df_train)
forecasts_test = sf.predict(h=len(df_test), level=[90])
end = time()
forecasts_test.head()

	unique_id	ds	MSTL	MSTL-lo-90	MSTL-hi-90	SeasonalNaive	SeasonalNaive-lo-90	SeasonalNaive-hi-90
0	PJM_Load_hourly	2001-12-31 01:00:00	29158.872180	28785.567875	29532.176486	28326.0	23468.555872	33183.444128
1	PJM_Load_hourly	2001-12-31 02:00:00	28233.452263	27573.789089	28893.115438	27362.0	22504.555872	32219.444128
2	PJM_Load_hourly	2001-12-31 03:00:00	27915.251368	27004.459000	28826.043736	27108.0	22250.555872	31965.444128
3	PJM_Load_hourly	2001-12-31 04:00:00	27969.396560	26836.674164	29102.118956	26865.0	22007.555872	31722.444128
4	PJM_Load_hourly	2001-12-31 05:00:00	28469.805588	27139.306401	29800.304775	26808.0	21950.555872	31665.444128

time_mstl = (end - init) / 60
print(f'MSTL Time: {time_mstl:.2f} minutes')

MSTL Time: 0.46 minutes

然后，我们就可以生成未来 24 小时的预测。现在让我们看看预测结果与实际值的图表比较。

plot_series(df_train, df_test.merge(forecasts_test), level=[90], max_insample_length=24*7)

让我们看看仅由 MSTL 生成的预测结果。

plot_series(df_train, df_test.merge(forecasts_test), level=[90], max_insample_length=24*7, models=['MSTL'])

我们注意到 MSTL 生成的预测非常准确，并且遵循时间序列的行为。现在我们来数值计算模型的准确性。我们将使用以下指标：MAE、MAPE、MASE、RMSE、SMAPE。

from functools import partial

from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, mape, mase, rmse, smape

eval_df = evaluate(
    df=df_test.merge(forecasts_test),
    train_df=df_train,
    metrics=[partial(mase, seasonality=24), mae, mape, rmse, smape],
    agg_fn='mean',
).set_index('metric').T
eval_df

指标	mase	mae	mape	rmse	smape
MSTL	0.587265	1219.321795	0.036052	1460.223279	0.017577
SeasonalNaive	0.894653	1857.541667	0.056482	2201.384101	0.029343

1 - eval_df.loc['MSTL', 'mase'] / eval_df.loc['SeasonalNaive', 'mase']

0.3435830717111049

我们观察到，在测试集上，MSTL 在 MASE 指标上比 SeasonalNaive 方法提高了约 35%。

与 Prophet 的比较

时间序列预测中最广泛使用的模型之一是 Prophet。该模型以其建模不同季节性（每周、每日、每年）的能力而闻名。我们将使用此模型作为基准，以查看 MSTL 是否为此时间序列增加了价值。

from prophet import Prophet

# create prophet model
prophet = Prophet(interval_width=0.9)
init = time()
prophet.fit(df_train)
# produce forecasts
future = prophet.make_future_dataframe(periods=len(df_test), freq='H', include_history=False)
forecast_prophet = prophet.predict(future)
end = time()
# data wrangling
forecast_prophet = forecast_prophet[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
forecast_prophet.columns = ['ds', 'Prophet', 'Prophet-lo-90', 'Prophet-hi-90']
forecast_prophet.insert(0, 'unique_id', 'PJM_Load_hourly')
forecast_prophet.head()

16:56:47 - cmdstanpy - INFO - Chain [1] start processing
16:57:09 - cmdstanpy - INFO - Chain [1] done processing

	unique_id	ds	Prophet	Prophet-lo-90	Prophet-hi-90
0	PJM_Load_hourly	2001-12-31 01:00:00	25294.246960	20299.105766	30100.467618
1	PJM_Load_hourly	2001-12-31 02:00:00	24000.725423	19285.395144	28777.495372
2	PJM_Load_hourly	2001-12-31 03:00:00	23324.771966	18536.736306	28057.063589
3	PJM_Load_hourly	2001-12-31 04:00:00	23332.519871	18591.879190	28720.461289
4	PJM_Load_hourly	2001-12-31 05:00:00	24107.126827	18934.471254	29116.352931

time_prophet = (end - init) / 60
print(f'Prophet Time: {time_prophet:.2f} minutes')

Prophet Time: 0.41 minutes

times = pd.DataFrame({'model': ['MSTL', 'Prophet'], 'time (mins)': [time_mstl, time_prophet]})
times

	模型	时间 (分钟)
0	MSTL	0.455999
1	Prophet	0.408726

我们观察到 Prophet 执行拟合和预测流程所需的时间大于 MSTL。让我们看看 Prophet 生成的预测结果。

forecasts_test = forecasts_test.merge(forecast_prophet, how='left', on=['unique_id', 'ds'])

plot_series(df_train, forecasts_test, max_insample_length=24*7, level=[90])

我们注意到 Prophet 能够捕捉时间序列的整体行为。然而，在某些情况下，它生成的预测值远低于实际值。它也无法正确调整波谷。

eval_df = evaluate(
    df=df_test.merge(forecasts_test),
    train_df=df_train,
    metrics=[partial(mase, seasonality=24), mae, mape, rmse, smape],
    agg_fn='mean',
).set_index('metric').T
eval_df

指标	mase	mae	mape	rmse	smape
MSTL	0.587265	1219.321795	0.036052	1460.223279	0.017577
SeasonalNaive	0.894653	1857.541667	0.056482	2201.384101	0.029343
Prophet	1.099551	2282.966977	0.073750	2721.817203	0.038633

1 - eval_df.loc['MSTL', 'mase'] / eval_df.loc['Prophet', 'mase']

0.4659047602697266

在准确性方面，Prophet 无法生成比 SeasonalNaive 模型更好的预测结果，然而，MSTL 模型在 MASE 指标上将 Prophet 的预测结果提高了 45%。

与 NeuralProphet 的比较

NeuralProphet 是使用深度学习的 Prophet 版本。该模型也能够处理不同的季节性，因此我们也将它用作基准。

from neuralprophet import NeuralProphet

neuralprophet = NeuralProphet(quantiles=[0.05, 0.95])
init = time()
neuralprophet.fit(df_train.drop(columns='unique_id'))
future = neuralprophet.make_future_dataframe(df=df_train.drop(columns='unique_id'), periods=len(df_test))
forecast_np = neuralprophet.predict(future)
end = time()
forecast_np = forecast_np[['ds', 'yhat1', 'yhat1 5.0%', 'yhat1 95.0%']]
forecast_np.columns = ['ds', 'NeuralProphet', 'NeuralProphet-lo-90', 'NeuralProphet-hi-90']
forecast_np.insert(0, 'unique_id', 'PJM_Load_hourly')
forecast_np.head()

WARNING - (NP.forecaster.fit) - When Global modeling with local normalization, metrics are displayed in normalized scale.
INFO - (NP.df_utils._infer_frequency) - Major frequency h corresponds to 99.973% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as h
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 128
INFO - (NP.config.set_auto_batch_epoch) - Auto-set epochs to 40

Training: |                                                                                                   …

WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (257) is too small than the required number                     for the learning rate finder (262). The results might not be optimal.

Finding best initial lr:   0%|          | 0/262 [00:00<?, ?it/s]

Training: |                                                                                                   …

INFO - (NP.df_utils._infer_frequency) - Major frequency h corresponds to 99.973% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - h
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency h corresponds to 95.833% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - h
INFO - (NP.df_utils._infer_frequency) - Major frequency h corresponds to 95.833% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - h

Predicting: |                                                                                                 …

INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column

	unique_id	ds	NeuralProphet	NeuralProphet-lo-90	NeuralProphet-hi-90
0	PJM_Load_hourly	2001-12-31 01:00:00	25292.386719	22520.238281	27889.425781
1	PJM_Load_hourly	2001-12-31 02:00:00	24378.796875	21640.460938	27056.906250
2	PJM_Load_hourly	2001-12-31 03:00:00	23852.919922	20978.291016	26583.130859
3	PJM_Load_hourly	2001-12-31 04:00:00	23540.554688	20700.035156	26247.121094
4	PJM_Load_hourly	2001-12-31 05:00:00	24016.589844	21298.316406	26748.933594

time_np = (end - init) / 60
print(f'Prophet Time: {time_np:.2f} minutes')

Prophet Time: 1.98 minutes

times = pd.concat([times, pd.DataFrame({'model': ['NeuralProphet'], 'time (mins)': [time_np]})])
times

	模型	时间 (分钟)
0	MSTL	0.455999
1	Prophet	0.408726
0	NeuralProphet	1.981253

我们观察到 NeuralProphet 需要比 Prophet 和 MSTL 更长的处理时间。

forecasts_test = forecasts_test.merge(forecast_np, how='left', on=['unique_id', 'ds'])

plot_series(df_train, forecasts_test, max_insample_length=24*7, level=[90])

预测图显示，正如预期，NeuralProphet 生成的结果与 Prophet 非常相似。

eval_df = evaluate(
    df=df_test.merge(forecasts_test),
    train_df=df_train,
    metrics=[partial(mase, seasonality=24), mae, mape, rmse, smape],
    agg_fn='mean',
).set_index('metric').T
eval_df

指标	mase	mae	mape	rmse	smape
MSTL	0.587265	1219.321795	0.036052	1460.223279	0.017577
SeasonalNaive	0.894653	1857.541667	0.056482	2201.384101	0.029343
Prophet	1.099551	2282.966977	0.073750	2721.817203	0.038633
NeuralProphet	1.061160	2203.255941	0.071060	2593.708496	0.037108

1 - eval_df.loc['MSTL', 'mase'] / eval_df.loc['NeuralProphet', 'mase']

0.4465818643911057

在数值评估方面，正如预期，NeuralProphet 改进了 Prophet 的结果，然而，MSTL 在 MASE 指标上将 NeuralProphet 的预测结果提高了 44%。

重要提示

通过超参数优化可以改进 NeuralProphet 的性能，但这会显著增加拟合时间。本示例中展示的是其默认版本的性能。

结论

在这篇文章中，我们介绍了 MSTL，该模型最初由 Kasun Bandara、Rob Hyndman 和 Christoph Bergmeir 开发，能够处理具有多重季节性的时间序列。我们还展示了对于 PJM 电力负荷时间序列，它在时间和准确性方面比 Prophet 和 NeuralProphet 模型提供了更好的性能。

参考文献

Bandara, Kasun & Hyndman, Rob & Bergmeir, Christoph. (2021). “MSTL：一种用于具有多重季节性模式的时间序列的季节性趋势分解算法”.

交叉验证检测需求峰值

在本页面

简介
库
使用多重季节性进行预测
电力负荷数据
MSTL 模型
拟合模型
将时间序列分解为多重季节性
生成预测
MSTL 模型的性能
划分训练/测试集
MSTL 模型
与 Prophet 的比较
与 NeuralProphet 的比较
结论
参考文献

入门

教程

操作指南

分布式

实验

模型参考

API 参考

贡献

简介

库

使用多重季节性进行预测

电力负荷数据

MSTL 模型

拟合模型

将时间序列分解为多重季节性

生成预测

MSTL 模型的性能

划分训练/测试集

MSTL 模型

与 Prophet 的比较

与 NeuralProphet 的比较

结论

参考文献

入门

教程

操作指南

分布式

实验

模型参考

API 参考

贡献

​简介

​库

​使用多重季节性进行预测

​电力负荷数据

​MSTL 模型

​拟合模型

​将时间序列分解为多重季节性

​生成预测

​MSTL 模型的性能

​划分训练/测试集

​MSTL 模型

​与 Prophet 的比较

​与 NeuralProphet 的比较

​结论

​参考文献

简介

库

使用多重季节性进行预测

电力负荷数据

MSTL 模型

拟合模型

将时间序列分解为多重季节性

生成预测

MSTL 模型的性能

划分训练/测试集

MSTL 模型

与 Prophet 的比较

与 NeuralProphet 的比较

结论

参考文献