当数据集中存在异常值时,它们会扰乱计算出的统计摘要,例如均值和标准差,导致模型偏向异常值并偏离大多数观测值。因此,模型需要努力在准确适应异常值和在正常数据上表现良好之间取得平衡,从而提高在这两种类型数据上的整体性能。鲁棒回归算法解决了这个问题,明确考虑了数据集中的异常值。

在本 notebook 中,我们将展示如何拟合鲁棒的 NeuralForecast 方法。我们将
- 安装 NeuralForecast。
- 加载带噪声的 AirPassengers 数据集。
- 拟合和预测鲁棒化的 NeuralForecast。
- 绘制和评估预测结果。

您可以使用 Google Colab 的 GPU 来运行这些实验。

1. 安装 NeuralForecast

!pip install neuralforecast
import logging
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from random import random
from random import randint
from random import seed

from neuralforecast import NeuralForecast
from neuralforecast.utils import AirPassengersDF

from neuralforecast.models import NHITS
from neuralforecast.losses.pytorch import MQLoss, DistributionLoss, HuberMQLoss

from utilsforecast.losses import mape, mqloss
from utilsforecast.evaluation import evaluate
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)

2. 加载带噪声的 AirPassengers 数据集

在本例中,我们将使用经典的 Box-Cox AirPassengers 数据集,并引入异常值来扩充它。

特别是,我们将重点在于向目标变量引入异常值,使其偏离原始观测值,偏离因子为指定的倍数,例如标准差的 2 到 4 倍。

# Original Box-Cox AirPassengers 
# as defined in neuralforecast.utils
Y_df = AirPassengersDF.copy() 
plt.plot(Y_df.y)
plt.ylabel('Monthly Passengers')
plt.xlabel('Timestamp [t]')
plt.grid()

# Here we add some artificial outliers to AirPassengers
seed(1)
for i in range(len(Y_df)):
    factor = randint(2, 4)
    if random() > 0.97:
        Y_df.loc[i, "y"] += factor * Y_df["y"].std()

plt.plot(Y_df.y)
plt.ylabel('Monthly Passengers + Noise')
plt.xlabel('Timestamp [t]')
plt.grid()

# Split datasets into train/test 
# Last 12 months for test
Y_train_df = Y_df.groupby('unique_id').head(-12)
Y_test_df = Y_df.groupby('unique_id').tail(12)
Y_test_df
unique_iddsy
1321.01960-01-31417.0
1331.01960-02-29391.0
1341.01960-03-31419.0
1351.01960-04-30461.0
1361.01960-05-31472.0
1371.01960-06-30535.0
1381.01960-07-31622.0
1391.01960-08-31606.0
1401.01960-09-30508.0
1411.01960-10-31461.0
1421.01960-11-30390.0
1431.01960-12-31432.0

3. 拟合和预测鲁棒化的 NeuralForecast

Huber MQ 损失

Huber 损失,用于鲁棒回归,是一种损失函数,与平方误差损失相比,它对数据中的异常值敏感性较低。Huber 损失函数对于小误差是二次函数,对于大误差是线性函数。这里我们将使用一种稍作修改的版本用于概率预测。您可以随意尝试 δ\delta 参数。

Dropout 正则化

Dropout 技术是一种用于神经网络中防止过拟合的正则化方法。在训练过程中,dropout 在每次更新时随机地将层中的一部分输入单元或神经元设置为零,有效地“丢弃”这些单元。这意味着网络不能依赖于任何单个单元,因为它随时可能被丢弃。通过这样做,dropout 通过防止单元之间过度协同适应,迫使网络学习更鲁棒和更具泛化能力的表示。

Dropout 方法可以帮助我们增强网络对自回归特征中异常值的鲁棒性。您可以通过 dropout_prob_theta 参数来探索它。

拟合 NeuralForecast 模型

使用 NeuralForecast.fit 方法,您可以训练一组模型来适应您的数据集。您可以定义预测 horizon (在本例中为 12),并修改模型的超参数。例如,对于 NHITS,我们更改了编码器和解码器的默认隐藏层大小。

请参阅 NHITSMLP模型文档

horizon = 12
level = [50, 80]

# Try different hyperparmeters to improve accuracy.
models = [NHITS(h=horizon,                           # Forecast horizon
                input_size=2 * horizon,              # Length of input sequence
                loss=HuberMQLoss(level=level),    # Robust Huber Loss
                valid_loss=MQLoss(level=level),   # Validation signal
                max_steps=500,                       # Number of steps to train
                dropout_prob_theta=0.6,              # Dropout to robustify vs outlier lag inputs
                #early_stop_patience_steps=2,        # Early stopping regularization patience
                val_check_steps=10,                  # Frequency of validation signal (affects early stopping)
                alias='Huber',
              ),
          NHITS(h=horizon,
                input_size=2 * horizon,
                loss=DistributionLoss(distribution='Normal', 
                                      level=level), # Classic Normal distribution
                valid_loss=MQLoss(level=level),
                max_steps=500,
                #early_stop_patience_steps=2,
                dropout_prob_theta=0.6,
                val_check_steps=10,
                alias='Normal',
              )
          ]
nf = NeuralForecast(models=models, freq='M')
nf.fit(df=Y_train_df)
Y_hat_df = nf.predict()
# By default NeuralForecast produces forecast intervals
# In this case the lo-x and high-x levels represent the 
# low and high bounds of the prediction accumulating x% probability
Y_hat_df
unique_iddsHuber-medianHuber-lo-80Huber-lo-50Huber-hi-50Huber-hi-80NormalNormal-medianNormal-lo-80Normal-lo-50Normal-hi-50Normal-hi-80
01.01960-01-31412.738525401.058044406.131958420.779266432.124268406.459717416.787842-124.278656135.413223680.997070904.871765
11.01960-02-29403.913544384.403534391.904419420.288208469.040375399.827148418.305725-137.291870103.988327661.940430946.699219
21.01960-03-31472.311523446.644531460.767334486.710999512.552979380.263947378.253998-105.411003117.415565647.887695883.611633
31.01960-04-30460.996674444.471039452.971802467.544189480.843903432.131378442.395844-104.205200135.457123729.306885974.661743
41.01960-05-31465.534790452.048889457.472626476.141022490.311005417.186279417.956543-117.399597150.915833692.936523930.934814
51.01960-06-30538.116028518.049866527.238159551.501709563.818848444.510834440.168396-54.501572189.301392703.502014946.068909
61.01960-07-31613.937866581.048035597.368408629.111450645.550659423.707275431.251526-97.069489164.821259687.764526942.432251
71.01960-08-31616.188660581.982300599.544128632.137512643.219543386.655823383.755157-134.702011139.954285658.973022897.393494
81.01960-09-30537.559143513.477478526.664856551.563293573.146667388.874817379.827057-139.859344110.772484673.086182926.355774
91.01960-10-31471.107605449.207916459.288025486.402985515.082458401.483643412.114990-185.92808595.805717703.490784970.837830
101.01960-11-30412.758423389.203308398.727295431.723602451.208588425.829895425.018799-172.022018108.840889723.4240111035.656128
111.01960-12-31457.254761438.565582446.097168468.809296483.967865406.916595399.852051-199.963684110.715050729.735107951.728577

4. 绘制和评估预测结果

最后,我们绘制了两种模型的预测结果与实际值对比图。

并评估 NHITS-HuberNHITS-Normal 预测器的准确性。

fig, ax = plt.subplots(1, 1, figsize = (20, 7))
plot_df = pd.concat([Y_train_df, Y_hat_df]).set_index('ds') # Concatenate the train and forecast dataframes
plot_df[['y', 'Huber-median', 'Normal-median']].plot(ax=ax, linewidth=2)

ax.set_title('Noisy AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()

为了评估中位数预测结果,我们使用平均绝对百分比误差 (MAPE),定义如下:

MAPE(yτ,y^τ)=mean(yτy^τyτ)\mathrm{MAPE}(\mathbf{y}_{\tau}, \hat{\mathbf{y}}_{\tau}) = \mathrm{mean}\left(\frac{|\mathbf{y}_{\tau}-\hat{\mathbf{y}}_{\tau}|}{|\mathbf{y}_{\tau}|}\right)

为了评估连贯的概率预测结果,我们使用连续排序概率得分 (CRPS),定义如下:

CRPS(F^τ,yτ)=01QL(F^τ,yτ)qdq\mathrm{CRPS}(\hat{F}_{\tau},\mathbf{y}_{\tau}) = \int^{1}_{0} \mathrm{QL}(\hat{F}_{\tau}, y_{\tau})_{q} dq

正如您所见,鲁棒回归的改进体现在正常预测和概率预测设置中。

df_metrics = Y_hat_df.merge(Y_test_df, on=['ds', 'unique_id'])
df_metrics.rename(columns={'Huber-median': 'Huber'}, inplace=True)

metrics = evaluate(df_metrics,
                   metrics=[mape, mqloss],
                   models=['Huber', 'Normal'],
                   level = [50, 80],
                   agg_fn="mean")

metrics
metricHuberNormal
0mape0.0347260.140207
1mqloss5.51153561.891651

参考文献