mlforecast 允许您对滞后定义变换作为特征使用。这些变换通过 lag_transforms 参数提供,该参数是一个字典,其中键是滞后,值是要应用于该滞后的一系列变换。

数据设置

import numpy as np

from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
data = generate_daily_series(10)

内置变换

内置滞后变换位于 mlforecast.lag_transforms 模块中。

from mlforecast.lag_transforms import RollingMean, ExpandingStd
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [ExpandingStd()],
        7: [RollingMean(window_size=7, min_samples=1), RollingMean(window_size=14)]
    },
)

定义变换后,您可以使用 MLForecast.preprocess 查看它们的效果。

fcst.preprocess(data).head(2)
unique_iddsyexpanding_std_lag1rolling_mean_lag7_window_size7_min_samples1rolling_mean_lag7_window_size14
20id_02000-01-216.3199611.9563633.2344863.283064
21id_02000-01-220.0716772.0285453.2560553.291068

扩展内置变换

您可以使用 Combine 类组合内置变换,该类接受两个变换和一个运算符。

import operator

from mlforecast.lag_transforms import Combine
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            RollingMean(window_size=14),
            Combine(
                RollingMean(window_size=7),
                RollingMean(window_size=14),
                operator.truediv,
            )
        ],
    },
)
prep = fcst.preprocess(data)
prep.head(2)
unique_iddsyrolling_mean_lag1_window_size7rolling_mean_lag1_window_size14rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14
14id_02000-01-150.4350063.2344863.2830640.985204
15id_02000-01-161.4893093.2560553.2910680.989361
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag1_window_size14'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14']
)

如果您想将 Combine 中的某个变换应用于不同的滞后,您可以使用 Offset 类,该类会先应用偏移量,然后再应用变换。

from mlforecast.lag_transforms import Offset
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            Combine(
                RollingMean(window_size=7),
                Offset(RollingMean(window_size=7), n=1),
                operator.truediv,
            )
        ],
        2: [RollingMean(window_size=7)]
    },
)
prep = fcst.preprocess(data)
prep.head(2)
unique_iddsyrolling_mean_lag1_window_size7rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7rolling_mean_lag2_window_size7
8id_02000-01-091.4627983.3260810.9983313.331641
9id_02000-01-102.0355183.3609381.0104803.326081
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag2_window_size7'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7']
)

基于 numba 的变换

window-ops 包提供了定义为 numba JIT 编译函数的变换。我们使用 numba 是因为它使这些函数非常快速,并且可以绕过 python 的 GIL,从而允许通过多线程并发运行它们。

使用这些变换的主要优点是它们非常易于实现。然而,当我们需要在预测步骤中更新它们的值时,它们可能会非常慢,因为我们必须对完整历史记录再次调用函数并只保留最后一个值。因此,如果性能是一个问题,您应该尽量使用内置的变换,或者在 MLForecast.preprocessMLForecast.fit 中将 keep_last_n 设置为您变换所需的最小样本数。

from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.shift import shift_array
@njit
def ratio_over_previous(x, offset=1):
    """Computes the ratio between the current value and its `offset` lag"""
    return x / shift_array(x, offset=offset)

@njit
def diff_over_previous(x, offset=1):
    """Computes the difference between the current value and its `offset` lag"""
    return x - shift_array(x, offset=offset)

如果您的函数接受的参数多于输入数组,您可以提供一个元组,例如:(func, arg1, arg2, ...)

fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1, 2, 3],
    lag_transforms={
        1: [expanding_mean, ratio_over_previous, (ratio_over_previous, 2)],  # the second ratio sets offset=2
        2: [diff_over_previous],
    },
)
prep = fcst.preprocess(data)
prep.head(2)
unique_iddsylag1lag2lag3expanding_mean_lag1ratio_over_previous_lag1ratio_over_previous_lag1_offset2diff_over_previous_lag2
3id_02000-01-043.4818312.4458871.2187940.3229471.3292092.0068097.5736450.895847
4id_02000-01-054.1917213.4818312.4458871.2187941.8673651.4235462.8567851.227093

如您所见,函数名称被用作变换名称,并加上 _lag 后缀。如果函数有其他参数且未设置为默认值,它们也会被包含进来,就像这里的 offset=2 一样。

np.testing.assert_allclose(prep['lag1'] / prep['lag2'], prep['ratio_over_previous_lag1'])
np.testing.assert_allclose(prep['lag1'] / prep['lag3'], prep['ratio_over_previous_lag1_offset2'])
np.testing.assert_allclose(prep['lag2'] - prep['lag3'], prep['diff_over_previous_lag2'])