交叉验证

时间序列预测的主要挑战之一是固有的不确定性和随时间的变化性，因此验证所用模型的准确性和可靠性至关重要。交叉验证是一种稳健的模型验证技术，特别适用于此任务，因为它能提供关于模型在未见数据上的预期性能的见解，确保预测结果在部署到实际场景之前是可靠且有弹性的。

TimeGPT 理解时间序列预测的复杂需求，因此集成了 cross_validation 方法，旨在简化时间序列模型的验证过程。此功能使实践者能够根据历史数据严格测试其预测模型，评估其有效性，同时对其进行调优以获得最佳性能。本教程将指导您完成在 NixtlaClient 类中执行交叉验证的细致过程，确保您的时间序列预测模型不仅构建良好，而且经过验证，具有可信度和精确性。

1. 导入包

首先，我们安装并导入所需的包，然后初始化 Nixtla 客户端。

我们首先初始化一个 NixtlaClient 实例。

import pandas as pd
from nixtla import NixtlaClient

from IPython.display import display

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 使用 Azure AI 端点

要使用 Azure AI 端点，请记住同时设置 base_url 参数

nixtla_client = NixtlaClient(base_url="您的 azure ai 端点", api_key="您的 api_key")

2. 加载数据

让我们看一个使用 Peyton Manning 数据集的示例。

pm_df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv')

3. 交叉验证

TimeGPT 类中的 cross_validation 方法是一个高级功能，旨在对时间序列预测模型执行系统验证。此方法需要一个包含时间有序数据的 dataframe，并采用滚动窗口方案来仔细评估模型在不同时间段的性能，从而确保模型随时间推移的可靠性和稳定性。下面的动画展示了 TimeGPT 如何执行交叉验证。

关键参数包括 freq，它表示数据的频率，如果未指定则自动推断。id_col、time_col 和 target_col 参数分别指定每个序列的标识符、时间步长和目标值的对应列。该方法通过 n_windows 等参数提供自定义功能，指示评估模型的独立时间窗口数量，以及 step_size，确定这些窗口之间的间隔。如果未指定 step_size，则默认为预测范围 h。

该过程还允许通过 finetune_steps 进行模型细化，指定在新数据上进行模型微调的迭代次数。数据预处理可通过 clean_ex_first 进行管理，决定是否在预测之前清除外生信号。此外，该方法通过 date_features 参数支持从时间数据中进行增强的特征工程，该参数可以自动生成关键的日期相关特征，或接受自定义函数来创建定制特征。date_features_to_one_hot 参数进一步支持将分类日期特征转换为适合机器学习模型的格式。

在执行过程中，cross_validation 评估模型在每个窗口中的预测准确性，提供模型随时间推移的性能变异性和潜在过拟合的稳健视图。这种详细评估确保生成的预测结果不仅准确，而且在不同的时间背景下保持一致。

timegpt_cv_df = nixtla_client.cross_validation(
    pm_df, 
    h=7, 
    n_windows=5, 
    freq='D',
)
timegpt_cv_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

	ds	cutoff	y	TimeGPT
0	2015-12-17	2015-12-16	7.591862	7.939553
1	2015-12-18	2015-12-16	7.528869	7.887512
2	2015-12-19	2015-12-16	7.171657	7.766617
3	2015-12-20	2015-12-16	7.891331	7.931502
4	2015-12-21	2015-12-16	8.360071	8.312632

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请务必设置 model="azureai"

nixtla_client.cross_validation(..., model="azureai")

对于公共 API，我们支持两种模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。关于何时以及如何使用 timegpt-1-long-horizon，请参阅本教程。

cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        pm_df.tail(100), 
        timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
    )
    display(fig)

4. 带预测区间的交叉验证

在交叉验证期间也可以生成预测区间。为此，我们只需使用 level 参数。

timegpt_cv_df = nixtla_client.cross_validation(
    pm_df, 
    h=7, 
    n_windows=5, 
    freq='D',
    level=[80, 90],
)
timegpt_cv_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

	ds	cutoff	y	TimeGPT	TimeGPT-hi-80	TimeGPT-hi-90	TimeGPT-lo-80	TimeGPT-lo-90
0	2015-12-17	2015-12-16	7.591862	7.939553	8.201465	8.314956	7.677642	7.564151
1	2015-12-18	2015-12-16	7.528869	7.887512	8.175414	8.207470	7.599609	7.567553
2	2015-12-19	2015-12-16	7.171657	7.766617	8.267363	8.386674	7.265871	7.146560
3	2015-12-20	2015-12-16	7.891331	7.931502	8.205929	8.369983	7.657075	7.493020
4	2015-12-21	2015-12-16	8.360071	8.312632	9.184893	9.625794	7.440371	6.999469

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请务必设置 model="azureai"

nixtla_client.cross_validation(..., model="azureai")

对于公共 API，我们支持两种模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。关于何时以及如何使用 timegpt-1-long-horizon，请参阅本教程。

cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        pm_df.tail(100), 
        timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
        level=[80, 90],
        models=['TimeGPT']
    )
    display(fig)

5. 带外生变量的交叉验证

时间特征

在执行交叉验证时可以包含外生变量。此处我们使用 date_features 参数为每个月创建标签。然后模型在交叉验证期间使用这些特征进行预测。

timegpt_cv_df = nixtla_client.cross_validation(
    pm_df, 
    h=7, 
    n_windows=5,  
    freq='D',
    level=[80, 90],
    date_features=['month'],
    date_features_to_one_hot=True,
)
timegpt_cv_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous features: ['month_1.0', 'month_2.0', 'month_3.0', 'month_4.0', 'month_5.0', 'month_6.0', 'month_7.0', 'month_8.0', 'month_9.0', 'month_10.0', 'month_11.0', 'month_12.0']
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

	ds	cutoff	y	TimeGPT	TimeGPT-hi-80	TimeGPT-hi-90	TimeGPT-lo-80	TimeGPT-lo-90
0	2015-12-17	2015-12-16	7.591862	8.426320	8.721996	8.824101	8.130644	8.028540
1	2015-12-18	2015-12-16	7.528869	8.049962	8.452083	8.658603	7.647842	7.441321
2	2015-12-19	2015-12-16	7.171657	7.509098	7.984788	8.138017	7.033409	6.880180
3	2015-12-20	2015-12-16	7.891331	7.739536	8.306914	8.641355	7.172158	6.837718
4	2015-12-21	2015-12-16	8.360071	8.027471	8.722828	9.152306	7.332113	6.902636

cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        pm_df.tail(100), 
        timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
        level=[80, 90],
        models=['TimeGPT']
    )
    display(fig)

动态特征

此外，您可以传递动态外生变量，以便更好地向 TimeGPT 提供有关数据的信息。您只需在目标列之后添加外生回归变量即可。

Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity.csv')
X_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/exogenous-vars-electricity.csv')
df = Y_df.merge(X_df)

现在让我们考虑此信息对 TimeGPT 进行交叉验证

timegpt_cv_df_x = nixtla_client.cross_validation(
    df.groupby('unique_id').tail(100 * 48), 
    h=48, 
    n_windows=2,
    level=[80, 90]
)
cutoffs = timegpt_cv_df_x.query('unique_id == "BE"')['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        df.query('unique_id == "BE"').tail(24 * 7), 
        timegpt_cv_df_x.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
        models=['TimeGPT'],
        level=[80, 90],
    )
    display(fig)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: h
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请务必设置 model="azureai"

nixtla_client.cross_validation(..., model="azureai")

对于公共 API，我们支持两种模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。关于何时以及如何使用 timegpt-1-long-horizon，请参阅本教程。

6. 使用不同 TimeGPT 实例的交叉验证

此外，您可以使用 model 参数为不同的 TimeGPT 实例生成交叉验证。此处我们使用基础模型和用于长期预测的模型。

timegpt_cv_df_x_long_horizon = nixtla_client.cross_validation(
    df.groupby('unique_id').tail(100 * 48), 
    h=48, 
    n_windows=2,
    level=[80, 90],
    model='timegpt-1-long-horizon',
)
timegpt_cv_df_x_long_horizon.columns = timegpt_cv_df_x_long_horizon.columns.str.replace('TimeGPT', 'TimeGPT-LongHorizon')
timegpt_cv_df_x_models = timegpt_cv_df_x_long_horizon.merge(timegpt_cv_df_x)
cutoffs = timegpt_cv_df_x_models.query('unique_id == "BE"')['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        df.query('unique_id == "BE"').tail(24 * 7), 
        timegpt_cv_df_x_models.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
        models=['TimeGPT', 'TimeGPT-LongHorizon'],
        level=[80, 90],
    )
    display(fig)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: h
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

📘 Azure AI 中可用的模型

如果您正在使用 Azure AI 端点，请务必设置 model="azureai"

nixtla_client.cross_validation(..., model="azureai")

对于公共 API，我们支持两种模型：timegpt-1 和 timegpt-1-long-horizon。

默认情况下使用 timegpt-1。关于何时以及如何使用 timegpt-1-long-horizon，请参阅本教程。

入门

功能

部署

教程

用例

API 参考

1. 导入包

2. 加载数据

3. 交叉验证

4. 带预测区间的交叉验证

5. 带外生变量的交叉验证

时间特征

动态特征

6. 使用不同 TimeGPT 实例的交叉验证

入门

功能

部署

教程

用例

API 参考

​1. 导入包

​2. 加载数据

​3. 交叉验证

​4. 带预测区间的交叉验证

​5. 带外生变量的交叉验证

​时间特征

​动态特征

​6. 使用不同 TimeGPT 实例的交叉验证

1. 导入包

2. 加载数据

3. 交叉验证

4. 带预测区间的交叉验证

5. 带外生变量的交叉验证

时间特征

动态特征

6. 使用不同 TimeGPT 实例的交叉验证