MLForecast 类允许您对目标计算滞后变换,然而,有时您也想对动态外部特征计算变换。本指南展示了如何实现这一点。

数据准备

from mlforecast.utils import generate_series, generate_prices_for_series
series = generate_series(10, equal_ends=True)
prices = generate_prices_for_series(series)
prices.head(2)
dsunique_idprice
02000-10-0500.548814
12000-10-0600.715189

假设您有一些序列,其中包含每个 id 和日期的价格,并且您想计算未来 7 天的预测。由于价格是一个动态特征,您必须通过 MLForecast.predict 中的 X_df 提供未来的值。

如果您不仅想使用价格,还想使用价格的 lag7 和 lag1 的扩展均值等,您可以在训练前计算它们,将它们与您的序列合并,然后通过 X_df 提供未来的值。考虑以下示例。

计算变换

from mlforecast.lag_transforms import ExpandingMean

from mlforecast.feature_engineering import transform_exog
transformed_prices = transform_exog(prices, lags=[7], lag_transforms={1: [ExpandingMean()]})
transformed_prices.head(10)
dsunique_idpriceprice_lag7price_expanding_mean_lag1
02000-10-0500.548814NaNNaN
12000-10-0600.715189NaN0.548814
22000-10-0700.602763NaN0.632001
32000-10-0800.544883NaN0.622255
42000-10-0900.423655NaN0.602912
52000-10-1000.645894NaN0.567061
62000-10-1100.437587NaN0.580200
72000-10-1200.8917730.5488140.559827
82000-10-1300.9636630.7151890.601320
92000-10-1400.3834420.6027630.641580

您现在可以将其与您的原始序列合并

series_with_prices = series.merge(transformed_prices, on=['unique_id', 'ds'])
series_with_prices.head(10)
unique_iddsypriceprice_lag7price_expanding_mean_lag1
002000-10-050.3229470.548814NaNNaN
102000-10-061.2187940.715189NaN0.548814
202000-10-072.4458870.602763NaN0.632001
302000-10-083.4818310.544883NaN0.622255
402000-10-094.1917210.423655NaN0.602912
502000-10-105.3958630.645894NaN0.567061
602000-10-116.2644470.437587NaN0.580200
702000-10-120.2840220.8917730.5488140.559827
802000-10-131.4627980.9636630.7151890.601320
902000-10-142.0355180.3834420.6027630.641580

然后您可以定义您的预测对象。请注意,您仍然可以像往常一样基于目标计算滞后特征。

from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast
fcst = MLForecast(
    models=[LinearRegression()],
    freq='D',
    lags=[1],
    date_features=['dayofweek'],
)
fcst.preprocess(series_with_prices, static_features=[], dropna=True).head()
unique_iddsypriceprice_lag7price_expanding_mean_lag1lag1dayofweek
102000-10-061.2187940.715189NaN0.5488140.3229474
202000-10-072.4458870.602763NaN0.6320011.2187945
302000-10-083.4818310.544883NaN0.6222552.4458876
402000-10-094.1917210.423655NaN0.6029123.4818310
502000-10-105.3958630.645894NaN0.5670614.1917211

重要的是要注意,dropna 参数仅考虑基于目标生成的滞后特征所产生的空值。如果您想删除所有包含空值的行,您必须在原始序列中进行此操作。

series_with_prices2 = series_with_prices.dropna()
fcst.preprocess(series_with_prices2, dropna=True, static_features=[]).head()
unique_iddsypriceprice_lag7price_expanding_mean_lag1lag1dayofweek
802000-10-131.4627980.9636630.7151890.6013200.2840224
902000-10-142.0355180.3834420.6027630.6415801.4627985
1002000-10-153.0435650.7917250.5448830.6157662.0355186
1102000-10-164.0101090.5288950.4236550.6317633.0435650
1202000-10-175.4163100.5680450.6458940.6231904.0101091

您现在可以训练模型了。

fcst.fit(series_with_prices2, static_features=[])
MLForecast(models=[LinearRegression], freq=D, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

并使用价格进行预测。请注意,您可以提供包含完整历史记录的 dataframe,mlforecast 将过滤预测范围内所需的日期。

fcst.predict(1, X_df=transformed_prices).head()
unique_iddsLinearRegression
002001-05-153.803967
112001-05-153.512489
222001-05-153.170019
332001-05-154.307121
442001-05-153.018758

在此示例中,我们有未来 7 天的价格,如果您尝试预测更长的时间范围,将会收到错误。

from fastcore.test import test_fail
test_fail(lambda: fcst.predict(8, X_df=transformed_prices), contains='Found missing inputs in X_df')