MLForecast

数据

这展示了仅包含 M4 数据集中 4 个序列的示例。如果你想在所有序列上自己运行，可以参考这本笔记本。

import random
import tempfile

import lightgbm as lgb
import matplotlib.pyplot as plt
import numpy as np
import xgboost as xgb
from sklearn.linear_model import LinearRegression
from utilsforecast.feature_engineering import time_features
from utilsforecast.plotting import plot_series

from mlforecast.lag_transforms import ExpandingMean, ExponentiallyWeightedMean, RollingMean
from mlforecast.lgb_cv import LightGBMCV
from mlforecast.target_transforms import Differences, LocalStandardScaler
from mlforecast.utils import generate_daily_series

df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet')
ids = df['unique_id'].unique()
random.seed(0)
sample_ids = random.choices(ids, k=4)
sample_df = df[df['unique_id'].isin(sample_ids)]
sample_df

	unique_id	ds	y
86796	H196	1	11.8
86797	H196	2	11.4
86798	H196	3	11.1
86799	H196	4	10.8
86800	H196	5	10.6
…	…	…	…
325235	H413	1004	99.0
325236	H413	1005	88.0
325237	H413	1006	47.0
325238	H413	1007	41.0
325239	H413	1008	34.0

我们现在将这些数据分为训练集和验证集。

horizon = 48
valid = sample_df.groupby('unique_id').tail(horizon)
train = sample_df.drop(valid.index)
train.shape, valid.shape

((3840, 3), (192, 3))

源

 MLForecast (models:Union[sklearn.base.BaseEstimator,List[sklearn.base.Bas
             eEstimator],Dict[str,sklearn.base.BaseEstimator]],
             freq:Union[int,str], lags:Optional[Iterable[int]]=None, lag_t
             ransforms:Optional[Dict[int,List[Union[Callable,Tuple[Callabl
             e,Any]]]]]=None,
             date_features:Optional[Iterable[Union[str,Callable]]]=None,
             num_threads:int=1, target_transforms:Optional[List[Union[mlfo
             recast.target_transforms.BaseTargetTransform,mlforecast.targe
             t_transforms._BaseGroupedArrayTargetTransform]]]=None,
             lag_transforms_namer:Optional[Callable]=None)

预测管道

	类型	默认值	详情
模型	联合类型		将用于训练和计算预测的模型。
频率	联合类型		Pandas 偏移量、pandas 偏移量别名，例如 ‘D’、‘W-THU’，或表示序列频率的整数。
滞后项	可选	无	用作特征的目标变量的滞后项。
滞后变换	可选	无	目标滞后项与其变换的映射。
日期特征	可选	无	从日期计算得到的特征。可以是 pandas 日期属性或以日期作为输入的函数。
线程数	int	1	计算特征时使用的线程数。
目标变换	可选	无	在计算特征之前应用于目标变量的变换，并在预测步骤之后恢复。
滞后变换命名器	可选	无	一个函数，接受一个变换（函数或类）、一个滞后项和额外参数，并生成一个名称。

MLForecast 对象封装了特征工程 + 模型训练 + 预测的功能。

fcst = MLForecast(
    models=lgb.LGBMRegressor(random_state=0, verbosity=-1),
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
    lag_transforms={
        48: [ExponentiallyWeightedMean(alpha=0.3)],
    },
    num_threads=1,
    target_transforms=[Differences([24])],
)
fcst

MLForecast(models=[LGBMRegressor], freq=1, lag_features=['lag24', 'lag48', 'lag72', 'lag96', 'lag120', 'lag144', 'lag168', 'exponentially_weighted_mean_lag48_alpha0.3'], date_features=[], num_threads=1)

完成此设置后，我们可以计算特征并拟合模型。

源

MLForecast.fit

 MLForecast.fit
                 (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fr
                 ame.DataFrame], id_col:str='unique_id',
                 time_col:str='ds', target_col:str='y',
                 static_features:Optional[List[str]]=None,
                 dropna:bool=True, keep_last_n:Optional[int]=None,
                 max_horizon:Optional[int]=None, prediction_intervals:Opti
                 onal[mlforecast.utils.PredictionIntervals]=None,
                 fitted:bool=False, as_numpy:bool=False,
                 weight_col:Optional[str]=None)

应用特征工程并训练模型。

	类型	默认值	详情
df	联合类型		长格式的序列数据。
id_col	str	unique_id	标识每个序列的列。
time_col	str	ds	标识每个时间步的列，其值可以是时间戳或整数。
target_col	str	y	包含目标变量的列。
静态特征	可选	无	静态特征的名称，预测时将重复使用。如果 `None`，则会将所有列（id_col 和 time_col 除外）视为静态特征。
dropna	bool	True	删除由变换产生的缺失值行。
keep_last_n	可选	无	对于每个序列，仅保留这些数量的记录用于预测步骤。如果你的特征允许，可以节省时间和内存。
最大预测范围	可选	无	训练这么多个模型，每个模型将预测一个特定的预测范围。
预测区间	可选	无	用于校准预测区间（保形预测）的配置。
fitted	bool	False	保存样本内预测。
as_numpy	bool	False	将特征转换为 numpy 数组。
weight_col	可选	无	包含样本权重的列。
返回值	MLForecast		包含序列值和训练好的模型的预测对象。

fcst = MLForecast(
    models=lgb.LGBMRegressor(random_state=0, verbosity=-1),
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
    lag_transforms={
        48: [ExponentiallyWeightedMean(alpha=0.3)],
    },
    num_threads=1,
    target_transforms=[Differences([24])],
)

train2 = train.copy()
train2['weight'] = np.random.default_rng(seed=0).random(train2.shape[0])
fcst.fit(train2, weight_col='weight', as_numpy=True).predict(5)

	unique_id	ds	LGBMRegressor
0	H196	961	16.079737
1	H196	962	15.679737
2	H196	963	15.279737
3	H196	964	14.979737
4	H196	965	14.679737
5	H256	961	13.279737
6	H256	962	12.679737
7	H256	963	12.379737
8	H256	964	12.079737
9	H256	965	11.879737
10	H381	961	56.939977
11	H381	962	40.314608
12	H381	963	33.859013
13	H381	964	15.498139
14	H381	965	25.722674
15	H413	961	25.131194
16	H413	962	19.177421
17	H413	963	21.250829
18	H413	964	18.743132
19	H413	965	16.027263

fcst.cross_validation(train2, n_windows=2, h=5, weight_col='weight', as_numpy=True)

	unique_id	ds	cutoff	y	LGBMRegressor
0	H196	951	950	24.4	24.288850
1	H196	952	950	24.3	24.188850
2	H196	953	950	23.8	23.688850
3	H196	954	950	22.8	22.688850
4	H196	955	950	21.2	21.088850
5	H256	951	950	19.5	19.688850
6	H256	952	950	19.4	19.488850
7	H256	953	950	18.9	19.088850
8	H256	954	950	18.3	18.388850
9	H256	955	950	17.0	17.088850
10	H381	951	950	182.0	208.327270
11	H381	952	950	222.0	247.768326
12	H381	953	950	288.0	277.965997
13	H381	954	950	264.0	321.532857
14	H381	955	950	191.0	206.316903
15	H413	951	950	77.0	60.972692
16	H413	952	950	91.0	54.936494
17	H413	953	950	76.0	73.949203
18	H413	954	950	68.0	67.087417
19	H413	955	950	68.0	75.896022
20	H196	956	955	19.3	19.287891
21	H196	957	955	18.2	18.187891
22	H196	958	955	17.5	17.487891
23	H196	959	955	16.9	16.887891
24	H196	960	955	16.5	16.487891
25	H256	956	955	15.5	15.687891
26	H256	957	955	14.7	14.787891
27	H256	958	955	14.1	14.287891
28	H256	959	955	13.6	13.787891
29	H256	960	955	13.2	13.387891
30	H381	956	955	130.0	124.117828
31	H381	957	955	113.0	119.180350
32	H381	958	955	94.0	105.356552
33	H381	959	955	192.0	127.095338
34	H381	960	955	87.0	119.875754
35	H413	956	955	59.0	67.993133
36	H413	957	955	58.0	69.869815
37	H413	958	955	53.0	34.717960
38	H413	959	955	38.0	47.665581
39	H413	960	955	46.0	45.940137

fcst.fit(train, fitted=True);

源

MLForecast.save

 MLForecast.save (path:Union[str,pathlib.Path])

保存预测对象

	类型	详情
path	联合类型	存储工件的目录。
返回值	无

源

MLForecast.load

 MLForecast.load (path:Union[str,pathlib.Path])

加载预测对象

	类型	详情
path	联合类型	包含已保存工件的目录。
返回值	MLForecast

源

MLForecast.update

 MLForecast.update
                    (df:Union[pandas.core.frame.DataFrame,polars.dataframe
                    .frame.DataFrame])

更新存储序列的值。

	类型	详情
df	联合类型	包含新观测值的 Dataframe。
返回值	无

源

MLForecast.make_future_dataframe

 MLForecast.make_future_dataframe (h:int)

创建一个包含所有 id 和预测范围内的未来时间的 dataframe。

	类型	详情
h	int	要预测的周期数。
返回值	联合类型	包含预期 id 和未来时间的 DataFrame

expected_future = fcst.make_future_dataframe(h=1)
expected_future

	unique_id	ds
0	H196	961
1	H256	961
2	H381	961
3	H413	961

源

MLForecast.get_missing_future

 MLForecast.get_missing_future (h:int, X_df:~DFType)

获取 X_df 中缺失的 id 和时间组合。

	类型	详情
h	int	要预测的周期数。
X_df	DFType	包含未来外部特征的 Dataframe。应包含 id 列和时间列。
返回值	DFType	包含在 `X_df` 中缺失的预期 id 和未来时间的 DataFrame

missing_future = fcst.get_missing_future(h=1, X_df=expected_future.head(2))
pd.testing.assert_frame_equal(
    missing_future,
    expected_future.tail(2).reset_index(drop=True)
)

源

MLForecast.forecast_fitted_values

 MLForecast.forecast_fitted_values
                                    (level:Optional[List[Union[int,float]]
                                    ]=None)

访问样本内预测。

	类型	默认值	详情
level	可选	无	预测区间置信水平，介于 0 到 100 之间。
返回值	联合类型		包含训练集预测的 Dataframe

fcst.forecast_fitted_values()

	unique_id	ds	y	LGBMRegressor
0	H196	193	12.7	12.671271
1	H196	194	12.3	12.271271
2	H196	195	11.9	11.871271
3	H196	196	11.7	11.671271
4	H196	197	11.4	11.471271
…	…	…	…	…
3067	H413	956	59.0	68.280574
3068	H413	957	58.0	70.427570
3069	H413	958	53.0	44.767965
3070	H413	959	38.0	48.691257
3071	H413	960	46.0	46.652238

fcst.forecast_fitted_values(level=[90])

	unique_id	ds	y	LGBMRegressor	LGBMRegressor-lo-90	LGBMRegressor-hi-90
0	H196	193	12.7	12.671271	12.540634	12.801909
1	H196	194	12.3	12.271271	12.140634	12.401909
2	H196	195	11.9	11.871271	11.740634	12.001909
3	H196	196	11.7	11.671271	11.540634	11.801909
4	H196	197	11.4	11.471271	11.340634	11.601909
…	…	…	…	…	…	…
3067	H413	956	59.0	68.280574	58.846640	77.714509
3068	H413	957	58.0	70.427570	60.993636	79.861504
3069	H413	958	53.0	44.767965	35.334031	54.201899
3070	H413	959	38.0	48.691257	39.257323	58.125191
3071	H413	960	46.0	46.652238	37.218304	56.086172

一旦运行完成，我们就可以计算预测结果了。

源

MLForecast.predict

 MLForecast.predict (h:int,
                     before_predict_callback:Optional[Callable]=None,
                     after_predict_callback:Optional[Callable]=None,
                     new_df:Optional[~DFType]=None,
                     level:Optional[List[Union[int,float]]]=None,
                     X_df:Optional[~DFType]=None,
                     ids:Optional[List[str]]=None)

计算未来 h 步的预测。

	类型	默认值	详情
h	int		要预测的周期数。
before_predict_callback	可选	无	在计算预测之前应用于特征的函数。此函数将接受传递给模型进行预测的输入 dataframe，并应返回具有相同结构的 dataframe。序列标识符位于索引中。
after_predict_callback	可选	无	在更新目标变量之前应用于预测结果的函数。此函数将接受一个包含预测结果的 pandas Series，并应返回另一个具有相同结构的 Series。序列标识符位于索引中。
new_df	可选	无	用于生成预测的新观测值的序列数据。这个 dataframe 应该与用于拟合模型的 dataframe 具有相同的结构，包括任何特征和时间序列数据。如果 `new_df` 不为 None，该方法将为新观测值生成预测。
level	可选	无	预测区间置信水平，介于 0 到 100 之间。
X_df	可选	无	包含未来外部特征的 Dataframe。应包含 id 列和时间列。
ids	可选	无	包含训练期间看到的 id 子集的列表，将为这些 id 计算预测。
返回值	DFType		每个序列和时间步的预测，每个模型有一列。

predictions = fcst.predict(horizon)

我们可以查看一些结果。

results = valid.merge(predictions, on=['unique_id', 'ds'])
fig = plot_series(forecasts_df=results)

预测区间

使用 MLForecast，你可以使用保形预测生成预测区间。要配置保形预测，你需要将 PredictionIntervals 类的一个实例传递给 fit 方法的 prediction_intervals 参数。该类接受三个参数：n_windows、h 和 method。

n_windows 表示用于校准区间的交叉验证窗口数
h 是预测范围
method 可以是 conformal_distribution 或 conformal_error；conformal_distribution（默认）根据交叉验证误差创建预测路径，并使用这些路径计算分位数，而 conformal_error 计算误差分位数以生成预测区间。该策略将针对每个预测步长调整区间，导致每个步长的宽度不同。请注意，必须至少使用 2 个交叉验证窗口。

fcst.fit(
    train,
    prediction_intervals=PredictionIntervals(n_windows=3, h=48)
);

之后，你只需使用 level 参数将所需的置信水平包含到 predict 方法中即可。水平必须介于 0 到 100 之间。

predictions_w_intervals = fcst.predict(48, level=[50, 80, 95])
predictions_w_intervals.head()

	unique_id	ds	LGBMRegressor	LGBMRegressor-lo-95	LGBMRegressor-lo-80	LGBMRegressor-lo-50	LGBMRegressor-hi-50	LGBMRegressor-hi-80	LGBMRegressor-hi-95
0	H196	961	16.071271	15.958042	15.971271	16.005091	16.137452	16.171271	16.184501
1	H196	962	15.671271	15.553632	15.553632	15.578632	15.763911	15.788911	15.788911
2	H196	963	15.271271	15.153632	15.153632	15.162452	15.380091	15.388911	15.388911
3	H196	964	14.971271	14.858042	14.871271	14.905091	15.037452	15.071271	15.084501
4	H196	965	14.671271	14.553632	14.553632	14.562452	14.780091	14.788911	14.788911

让我们探索生成的区间。

results = valid.merge(predictions_w_intervals, on=['unique_id', 'ds'])
fig = plot_series(forecasts_df=results, level=[50, 80, 95])

如果你想减少计算时间并为整个预测范围生成相同宽度的区间，只需将 h=1 传递给 PredictionIntervals 类即可。这种策略的缺点是，在某些情况下，绝对残差的方差可能很小（甚至为零），因此区间可能会太窄。

fcst.fit(
    train,  
    prediction_intervals=PredictionIntervals(n_windows=3, h=1)
);

predictions_w_intervals_ws_1 = fcst.predict(48, level=[80, 90, 95])

让我们探索生成的区间。

results = valid.merge(predictions_w_intervals_ws_1, on=['unique_id', 'ds'])
fig = plot_series(forecasts_df=results, level=[90])

使用预训练模型进行预测

MLForecast 允许你使用预训练模型为新数据集生成预测。只需在调用 predict 方法时，将包含新观测值的 pandas dataframe 作为 new_df 参数的值即可。该 dataframe 应与用于拟合模型的 dataframe 具有相同的结构，包括任何特征和时间序列数据。然后，该函数将使用预训练模型为新观测值生成预测。这使得你可以轻松地将预训练模型应用于新数据集并生成预测，而无需重新训练模型。

ercot_df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/ERCOT-clean.csv')
# we have to convert the ds column to integers
# since MLForecast was trained with that structure
ercot_df['ds'] = np.arange(1, len(ercot_df) + 1)
# use the `new_df` argument to pass the ercot dataset 
ercot_fcsts = fcst.predict(horizon, new_df=ercot_df)
fig = plot_series(ercot_df, ercot_fcsts, max_insample_length=48 * 2)

如果你想查看将用于训练模型的数据，可以调用 Forecast.preprocess。

源

MLForecast.preprocess

 MLForecast.preprocess (df:~DFType, id_col:str='unique_id',
                        time_col:str='ds', target_col:str='y',
                        static_features:Optional[List[str]]=None,
                        dropna:bool=True, keep_last_n:Optional[int]=None,
                        max_horizon:Optional[int]=None,
                        return_X_y:bool=False, as_numpy:bool=False,
                        weight_col:Optional[str]=None)

将特征添加到 data。

	类型	默认值	详情
df	DFType		长格式的序列数据。
id_col	str	unique_id	标识每个序列的列。
time_col	str	ds	标识每个时间步的列，其值可以是时间戳或整数。
target_col	str	y	包含目标变量的列。
静态特征	可选	无	静态特征的名称，预测时将重复使用。
dropna	bool	True	删除由变换产生的缺失值行。
keep_last_n	可选	无	对于每个序列，仅保留这些数量的记录用于预测步骤。如果你的特征允许，可以节省时间和内存。
最大预测范围	可选	无	训练这么多个模型，每个模型将预测一个特定的预测范围。
return_X_y	bool	False	返回包含特征和目标的元组。如果为 False，将返回一个 dataframe。
as_numpy	bool	False	将特征转换为 numpy 数组。仅当 `return_X_y=True` 时有效。
weight_col	可选	无	包含样本权重的列。
返回值	联合类型		`df` 加上添加的特征和目标变量。

prep_df = fcst.preprocess(train)
prep_df

	unique_id	ds	y	lag24	lag48	lag72	lag96	lag120	lag144	lag168	exponentially_weighted_mean_lag48_alpha0.3
86988	H196	193	0.1	0.0	0.0	0.0	0.3	0.1	0.1	0.3	0.002810
86989	H196	194	0.1	-0.1	0.1	0.0	0.3	0.1	0.1	0.3	0.031967
86990	H196	195	0.1	-0.1	0.1	0.0	0.3	0.1	0.2	0.1	0.052377
86991	H196	196	0.1	0.0	0.0	0.0	0.3	0.2	0.1	0.2	0.036664
86992	H196	197	0.0	0.0	0.0	0.1	0.2	0.2	0.1	0.2	0.025665
…	…	…	…	…	…	…	…	…	…	…	…
325187	H413	956	0.0	10.0	1.0	6.0	-53.0	44.0	-21.0	21.0	7.963225
325188	H413	957	9.0	10.0	10.0	-7.0	-46.0	27.0	-19.0	24.0	8.574257
325189	H413	958	16.0	8.0	5.0	-9.0	-36.0	32.0	-13.0	8.0	7.501980
325190	H413	959	-3.0	17.0	-7.0	2.0	-31.0	22.0	5.0	-2.0	3.151386
325191	H413	960	15.0	11.0	-6.0	-5.0	-17.0	22.0	-18.0	10.0	0.405970

如果我们这样做，那么必须调用 Forecast.fit_models，因为这仅存储了序列信息。

源

MLForecast.fit_models

 MLForecast.fit_models (X:Union[pandas.core.frame.DataFrame,polars.datafra
                        me.frame.DataFrame,numpy.ndarray],
                        y:numpy.ndarray)

手动训练模型。如果你事先调用了 MLForecast.preprocess，请使用此方法。

	类型	详情
X	联合类型	特征。
y	ndarray	目标。
返回值	MLForecast	包含训练好的模型的预测对象。

X, y = prep_df.drop(columns=['unique_id', 'ds', 'y']), prep_df['y']
fcst.fit_models(X, y)

MLForecast(models=[LGBMRegressor], freq=1, lag_features=['lag24', 'lag48', 'lag72', 'lag96', 'lag120', 'lag144', 'lag168', 'exponentially_weighted_mean_lag48_alpha0.3'], date_features=[], num_threads=1)

predictions2 = fcst.predict(horizon)
pd.testing.assert_frame_equal(predictions, predictions2)

源

MLForecast.cross_validation

 MLForecast.cross_validation (df:~DFType, n_windows:int, h:int,
                              id_col:str='unique_id', time_col:str='ds',
                              target_col:str='y',
                              step_size:Optional[int]=None,
                              static_features:Optional[List[str]]=None,
                              dropna:bool=True,
                              keep_last_n:Optional[int]=None,
                              refit:Union[bool,int]=True,
                              max_horizon:Optional[int]=None, before_predi
                              ct_callback:Optional[Callable]=None, after_p
                              redict_callback:Optional[Callable]=None, pre
                              diction_intervals:Optional[mlforecast.utils.
                              PredictionIntervals]=None,
                              level:Optional[List[Union[int,float]]]=None,
                              input_size:Optional[int]=None,
                              fitted:bool=False, as_numpy:bool=False,
                              weight_col:Optional[str]=None)

执行时间序列交叉验证。创建 n_windows 个分割，每个窗口包含 h 个测试周期，然后训练模型，计算预测并合并实际值。

	类型	默认值	详情
df	DFType		长格式的序列数据。
n_windows	int		要评估的窗口数。
h	int		预测范围。
id_col	str	unique_id	标识每个序列的列。
time_col	str	ds	标识每个时间步的列，其值可以是时间戳或整数。
target_col	str	y	包含目标变量的列。
step_size	可选	无	每个交叉验证窗口之间的步长。如果为 None，则等于 `h`。
静态特征	可选	无	静态特征的名称，预测时将重复使用。
dropna	bool	True	删除由变换产生的缺失值行。
keep_last_n	可选	无	对于每个序列，仅保留这些数量的记录用于预测步骤。如果你的特征允许，可以节省时间和内存。
refit	联合类型	True	对每个交叉验证窗口重新训练模型。如果为 False，模型在开始时训练，然后用于预测每个窗口。如果是正整数，模型将每隔 `refit` 个窗口重新训练。
最大预测范围	可选	无
before_predict_callback	可选	无	在计算预测之前应用于特征的函数。此函数将接受传递给模型进行预测的输入 dataframe，并应返回具有相同结构的 dataframe。序列标识符位于索引中。
after_predict_callback	可选	无	在更新目标变量之前应用于预测结果的函数。此函数将接受一个包含预测结果的 pandas Series，并应返回另一个具有相同结构的 Series。序列标识符位于索引中。
预测区间	可选	无	用于校准预测区间（保形预测）的配置。
level	可选	无	预测区间置信水平，介于 0 到 100 之间。
input_size	可选	无	每个窗口中每个序列的最大训练样本数。如果为 None，将使用扩展窗口。
fitted	bool	False	存储样本内预测。
as_numpy	bool	False	将特征转换为 numpy 数组。
weight_col	可选	无	包含样本权重的列。
返回值	DFType		每个窗口的预测，包含序列 id、时间戳、上次训练日期、目标值以及每个模型的预测。

如果我们想知道特定模型和特征集下的预测效果如何，可以进行交叉验证。交叉验证的作用是将我们的数据分成两部分，第一部分用于训练，第二部分用于验证。由于数据具有时间依赖性，我们通常将数据的最后 x 个观测值作为验证集。

此过程在 MLForecast.cross_validation 中实现，它接收我们的数据并执行上述过程 n_windows 次，其中每个窗口包含 h 个验证样本。例如，如果我们有 100 个样本并想进行 2 次回测，每次大小为 14，则分割如下：

训练集：1 到 72。验证集：73 到 86。
训练集：1 到 86。验证集：87 到 100。

你可以使用 step_size 参数控制每个交叉验证窗口之间的距离。例如，如果我们有 100 个样本并想进行 2 次大小为 14 的回测，并在每个折叠中向前移动一步（step_size=1），则分割如下：

训练集：1 到 85。验证集：86 到 99。
训练集：1 到 86。验证集：87 到 100。

你还可以通过设置 refit=False 来执行交叉验证，而无需对每个窗口重新训练模型。这使得你可以使用多个窗口大小评估模型的性能，而无需每次都重新训练它们。

fcst = MLForecast(
    models=lgb.LGBMRegressor(random_state=0, verbosity=-1),
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
    lag_transforms={
        1: [RollingMean(window_size=24)],
        24: [RollingMean(window_size=24)],
        48: [ExponentiallyWeightedMean(alpha=0.3)],
    },
    num_threads=1,
    target_transforms=[Differences([24])],
)
cv_results = fcst.cross_validation(
    train,
    n_windows=2,
    h=horizon,
    step_size=horizon,
    fitted=True,
)
cv_results

	unique_id	ds	cutoff	y	LGBMRegressor
0	H196	865	864	15.5	15.373393
1	H196	866	864	15.1	14.973393
2	H196	867	864	14.8	14.673393
3	H196	868	864	14.4	14.373393
4	H196	869	864	14.2	14.073393
…	…	…	…	…	…
379	H413	956	912	59.0	64.284167
380	H413	957	912	58.0	64.830429
381	H413	958	912	53.0	40.726851
382	H413	959	912	38.0	42.739657
383	H413	960	912	46.0	52.802769

由于我们将 fitted 设置为 True，我们也可以通过 cross_validation_fitted_values 方法访问训练集的预测结果。

fcst.cross_validation_fitted_values()

	unique_id	ds	fold	y	LGBMRegressor
0	H196	193	0	12.7	12.673393
1	H196	194	0	12.3	12.273393
2	H196	195	0	11.9	11.873393
3	H196	196	0	11.7	11.673393
4	H196	197	0	11.4	11.473393
…	…	…	…	…	…
5563	H413	908	1	49.0	50.620196
5564	H413	909	1	39.0	35.972331
5565	H413	910	1	29.0	29.359678
5566	H413	911	1	24.0	25.784563
5567	H413	912	1	20.0	23.168413

我们还可以通过将配置传递给 prediction_intervals 以及通过 levels 传递宽度值来计算预测区间。

cv_results_intervals = fcst.cross_validation(
    train,
    n_windows=2,
    h=horizon,
    step_size=horizon,
    prediction_intervals=PredictionIntervals(h=horizon),
    level=[80, 90]
)
cv_results_intervals

	unique_id	ds	cutoff	y	LGBMRegressor	LGBMRegressor-lo-90	LGBMRegressor-lo-80	LGBMRegressor-hi-80	LGBMRegressor-hi-90
0	H196	865	864	15.5	15.373393	15.311379	15.316528	15.430258	15.435407
1	H196	866	864	15.1	14.973393	14.940556	14.940556	15.006230	15.006230
2	H196	867	864	14.8	14.673393	14.606230	14.606230	14.740556	14.740556
3	H196	868	864	14.4	14.373393	14.306230	14.306230	14.440556	14.440556
4	H196	869	864	14.2	14.073393	14.006230	14.006230	14.140556	14.140556
…	…	…	…	…	…	…	…	…	…
379	H413	956	912	59.0	64.284167	29.890099	34.371545	94.196788	98.678234
380	H413	957	912	58.0	64.830429	56.874572	57.827689	71.833169	72.786285
381	H413	958	912	53.0	40.726851	35.296195	35.846206	45.607495	46.157506
382	H413	959	912	38.0	42.739657	35.292153	35.807640	49.671674	50.187161
383	H413	960	912	46.0	52.802769	42.465597	43.895670	61.709869	63.139941

refit 参数允许我们控制是否要在每个窗口中重新训练模型。它可以是

一个布尔值：True 将在每个窗口中重新训练，False 仅在第一个窗口中训练。
一个正整数：模型将在第一个窗口中训练，然后每隔 refit 个窗口训练一次。

fcst = MLForecast(
    models=LinearRegression(),
    freq=1,
    lags=[1, 24],
)
for refit, expected_models in zip([True, False, 2], [4, 1, 2]):
    fcst.cross_validation(
        train,
        n_windows=4,
        h=horizon,
        refit=refit,
    )
    test_eq(len(fcst.cv_models_), expected_models)

fig = plot_series(forecasts_df=cv_results.drop(columns='cutoff'))

fig = plot_series(forecasts_df=cv_results_intervals.drop(columns='cutoff'), level=[90])

源

MLForecast.from_cv

 MLForecast.from_cv (cv:mlforecast.lgb_cv.LightGBMCV)

一旦找到一组适用于你的问题的特征和参数，就可以使用 MLForecast.from_cv 从中构建一个预测对象，该方法接受训练好的 LightGBMCV 对象，并构建一个将使用相同特征和参数的 MLForecast 对象。然后，你就可以像往常一样调用 fit 和 predict。

cv = LightGBMCV(
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
    lag_transforms={
        48: [ExponentiallyWeightedMean(alpha=0.3)],
    },
    num_threads=1,
    target_transforms=[Differences([24])]
)
hist = cv.fit(
    train,
    n_windows=2,
    h=horizon,
    params={'verbosity': -1},
)

[10] mape: 0.118569
[20] mape: 0.111506
[30] mape: 0.107314
[40] mape: 0.106089
[50] mape: 0.106630
Early stopping at round 50
Using best iteration: 40

fcst = MLForecast.from_cv(cv)
assert cv.best_iteration_ == fcst.models['LGBMRegressor'].n_estimators

入门

操作指南

教程

API 参考

MLForecast

数据

MLForecast

MLForecast.fit

MLForecast.save

MLForecast.load

MLForecast.update

MLForecast.make_future_dataframe

MLForecast.get_missing_future

MLForecast.forecast_fitted_values

MLForecast.predict

预测区间

使用预训练模型进行预测

MLForecast.preprocess

MLForecast.fit_models

MLForecast.cross_validation

MLForecast.from_cv

入门

操作指南

教程

API 参考

​数据

​MLForecast

​MLForecast.fit

​MLForecast.save

​MLForecast.load

​MLForecast.update

​MLForecast.make_future_dataframe

​MLForecast.get_missing_future

​MLForecast.forecast_fitted_values

​MLForecast.predict

​预测区间

​使用预训练模型进行预测

​MLForecast.preprocess

​MLForecast.fit_models

​MLForecast.cross_validation

​MLForecast.from_cv

数据

MLForecast

MLForecast.fit

MLForecast.save

MLForecast.load

MLForecast.update

MLForecast.make_future_dataframe

MLForecast.get_missing_future

MLForecast.forecast_fitted_values

MLForecast.predict

预测区间

使用预训练模型进行预测

MLForecast.preprocess

MLForecast.fit_models

MLForecast.cross_validation

MLForecast.from_cv