默认情况下,mlforecast 使用递归策略,即训练一个模型来预测下一个值,如果我们要预测多个值,则逐个进行预测,然后使用模型的预测作为新的目标,重新计算特征并预测下一步。

还有另一种方法,如果我们想预测未来 10 步,则训练 10 个不同的模型,其中每个模型都经过训练来预测特定步骤的值,即一个模型预测下一个值,另一个模型预测未来两步的值,依此类推。这可能非常耗时,但也能提供更好的结果。如果您想使用这种方法,可以在 MLForecast.fit 中指定 max_horizon,这将训练指定数量的模型,并且在您调用 MLForecast.predict 时,每个模型都会预测其对应的范围。

设置

import random
import lightgbm as lgb
import pandas as pd
from datasetsforecast.m4 import M4, M4Info
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import smape

from mlforecast import MLForecast
from mlforecast.lag_transforms import ExponentiallyWeightedMean, RollingMean
from mlforecast.target_transforms import Differences

数据

我们将使用 M4 数据集中的四个随机序列

group = 'Hourly'
await M4.async_download('data', group=group)
df, *_ = M4.load(directory='data', group=group)
df['ds'] = df['ds'].astype('int')
ids = df['unique_id'].unique()
random.seed(0)
sample_ids = random.choices(ids, k=4)
sample_df = df[df['unique_id'].isin(sample_ids)]
info = M4Info[group]
horizon = info.horizon
valid = sample_df.groupby('unique_id').tail(horizon)
train = sample_df.drop(valid.index)
def avg_smape(df):
    """Computes the SMAPE by serie and then averages it across all series."""
    full = df.merge(valid)
    return (
        evaluate(full, metrics=[smape])
        .drop(columns='metric')
        .set_index('unique_id')
        .squeeze()
    )

模型

fcst = MLForecast(
    models=lgb.LGBMRegressor(random_state=0, verbosity=-1),
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
    lag_transforms={
        1: [RollingMean(window_size=24)],
        24: [RollingMean(window_size=24)],
        48: [ExponentiallyWeightedMean(alpha=0.3)],
    },
    num_threads=1,
    target_transforms=[Differences([24])],
)
horizon = 24
# the following will train 24 models, one for each horizon
individual_fcst = fcst.fit(train, max_horizon=horizon)
individual_preds = individual_fcst.predict(horizon)
avg_smape_individual = avg_smape(individual_preds).rename('individual')
# the following will train a single model and use the recursive strategy
recursive_fcst = fcst.fit(train)
recursive_preds = recursive_fcst.predict(horizon)
avg_smape_recursive = avg_smape(recursive_preds).rename('recursive')
# results
print('Average SMAPE per method and serie')
avg_smape_individual.to_frame().join(avg_smape_recursive).applymap('{:.1%}'.format)
Average SMAPE per method and serie
独立递归
唯一 ID
H1960.3%0.3%
H2560.4%0.3%
H38120.9%9.5%
H41311.9%13.6%