import lightgbm as lgb
import pandas as pd
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.utils import generate_daily_series, generate_prices_for_series

数据设置

series = generate_daily_series(
    100, equal_ends=True, n_static_features=2
).rename(columns={'static_1': 'product_id'})
series.head()
unique_iddsystatic_0product_id
0id_002000-10-0539.8119837945
1id_002000-10-06103.2740137945
2id_002000-10-07176.5747447945
3id_002000-10-08258.9879007945
4id_002000-10-09344.9404047945

使用现有的外生特征

在 mlforecast 中,所需的列是时间序列标识符、时间和目标。您拥有的任何额外列,例如此处的 static_0product_id,都被视为静态列,并在构建下一个时间戳的特征时进行复制。您可以通过将 static_features 传递给 MLForecast.preprocessMLForecast.fit 来禁用此功能,这将仅保留您在那里定义的静态列。请记住,输入 dataframe 中的所有特征都将用于训练,因此您必须通过 X_df 参数向 MLForecast.predict 提供外生特征的未来值。

考虑以下示例。假设我们有每个 id 和日期的价格目录。

prices_catalog = generate_prices_for_series(series)
prices_catalog.head()
dsunique_idprice
02000-10-05id_000.548814
12000-10-06id_000.715189
22000-10-07id_000.602763
32000-10-08id_000.544883
42000-10-09id_000.423655

并且您已经将这些价格合并到您的时间序列 dataframe 中。

series_with_prices = series.merge(prices_catalog, how='left')
series_with_prices.head()
unique_iddsystatic_0product_idprice
0id_002000-10-0539.81198379450.548814
1id_002000-10-06103.27401379450.715189
2id_002000-10-07176.57474479450.602763
3id_002000-10-08258.98790079450.544883
4id_002000-10-09344.94040479450.423655

此 dataframe 将被传递给 MLForecast.fit(或 MLForecast.preprocess)。然而,由于价格是动态的,我们必须告诉该方法只有 static_0product_id 是静态的。

fcst = MLForecast(
    models=lgb.LGBMRegressor(n_jobs=1, random_state=0, verbosity=-1),
    freq='D',
    lags=[7],
    lag_transforms={
        1: [ExpandingMean()],
        7: [RollingMean(window_size=14)],
    },
    date_features=['dayofweek', 'month'],
    num_threads=2,
)
fcst.fit(series_with_prices, static_features=['static_0', 'product_id'])
MLForecast(models=[LGBMRegressor], freq=D, lag_features=['lag7', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size14'], date_features=['dayofweek', 'month'], num_threads=2)

用于训练的特征存储在 MLForecast.ts.features_order_ 中。如您所见,price 被用于训练。

fcst.ts.features_order_
['static_0',
 'product_id',
 'price',
 'lag7',
 'expanding_mean_lag1',
 'rolling_mean_lag7_window_size14',
 'dayofweek',
 'month']

因此,为了在每个时间步更新价格,我们只需使用我们的预测范围调用 MLForecast.predict,并通过 X_df 传递价格目录。

preds = fcst.predict(h=7, X_df=prices_catalog)
preds.head()
unique_iddsLGBMRegressor
0id_002001-05-15418.930093
1id_002001-05-16499.487368
2id_002001-05-1720.321885
3id_002001-05-18102.310778
4id_002001-05-19185.340281

生成外生特征

Nixtla 提供了一些实用程序来为训练和预测生成外生特征,例如 statsforecast 的 mstl_decompositiontransform_exog 函数。我们还有 utilsforecast 的 fourier 函数,我们将在本节中进行演示。

from sklearn.linear_model import LinearRegression
from utilsforecast.feature_engineering import fourier

假设您从上面所示的包含一些静态特征的数据开始。

series.head()
unique_iddsystatic_0product_id
0id_002000-10-0539.8119837945
1id_002000-10-06103.2740137945
2id_002000-10-07176.5747447945
3id_002000-10-08258.9879007945
4id_002000-10-09344.9404047945

现在我们想添加一些傅里叶项来建模季节性。我们可以通过以下方式实现:

transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=7)

这提供了一个扩展的训练数据集。

transformed_df.head()
unique_iddsystatic_0product_idsin1_7sin2_7cos1_7cos2_7
0id_002000-10-0539.81198379450.7818320.9749280.623490-0.222521
1id_002000-10-06103.27401379450.974928-0.433884-0.222521-0.900969
2id_002000-10-07176.57474479450.433884-0.781831-0.9009690.623490
3id_002000-10-08258.9879007945-0.4338840.781832-0.9009690.623490
4id_002000-10-09344.9404047945-0.9749280.433884-0.222521-0.900969

以及特征的未来值。

future_df.head()
unique_iddssin1_7sin2_7cos1_7cos2_7
0id_002001-05-15-0.781828-0.9749300.623494-0.222511
1id_002001-05-160.0000060.0000111.0000001.000000
2id_002001-05-170.7818350.9749250.623485-0.222533
3id_002001-05-180.974927-0.433895-0.222527-0.900963
4id_002001-05-190.433878-0.781823-0.9009720.623500

现在我们可以只使用这些特征(以及静态特征)进行训练。

fcst2 = MLForecast(models=LinearRegression(), freq='D')
fcst2.fit(transformed_df, static_features=['static_0', 'product_id'])
MLForecast(models=[LinearRegression], freq=D, lag_features=[], date_features=[], num_threads=1)

并将未来值提供给 predict 方法。

fcst2.predict(h=7, X_df=future_df).head()
unique_iddsLinearRegression
0id_002001-05-15275.822342
1id_002001-05-16262.258117
2id_002001-05-17238.195850
3id_002001-05-18240.997814
4id_002001-05-19262.247123