数据要求

TimeGPT 接受长格式的 pandas 和 polars 数据框，需要以下列：

ds (时间戳): 时间戳格式为 YYYY-MM-DD 或 YYYY-MM-DD HH:MM:SS。
y (数值): 要预测的目标变量。

（可选：您也可以传入一个没有 ds 列但具有 DatetimeIndex 的 DataFrame）

TimeGPT 也支持分布式数据框，例如 dask、spark 和 ray。

您也可以在 DataFrame 中包含外生特征作为附加列。更多信息请参阅此教程。

以下是 TimeGPT 有效输入数据框的示例。

import pandas as pd 

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()

	timestamp	value
0	1949-01-01	112
1	1949-02-01	118
2	1949-03-01	132
3	1949-04-01	129
4	1949-05-01	121

请注意，在此示例中，ds 列被命名为 timestamp，y 列被命名为 value。您有两种选择：

分别将列重命名为 ds 和 y，或者
保留当前列名并在使用 NixtlaClient 类中的任何方法时，通过 time_col 和 target_col 参数指定它们。

例如，当使用 NixtlaClient 类中的 forecast 方法时，您必须实例化该类，然后如下指定列名。

from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key = 'my_api_key_provided_by_nixtla'
)

fcst = nixtla_client.forecast(df=df, h=12, time_col='timestamp', target_col='value')
fcst.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

	timestamp	TimeGPT
0	1961-01-01	437.83792
1	1961-02-01	426.06270
2	1961-03-01	463.11655
3	1961-04-01	478.24450
4	1961-05-01	505.64648

在此示例中，NixtlaClient 会推断频率，但您可以使用 freq 参数明确指定它。

要了解如何实例化 NixtlaClient 类，请参阅TimeGPT 快速入门

多序列

如果您处理多个时间序列，请确保每个序列都有一个唯一标识符。您可以将此列命名为 unique_id，或者在使用 NixtlaClient 类中的任何方法时，使用 id_col 参数指定其名称。此列应为字符串、整数或类别类型。

在此示例中，我们有五个序列，表示五个不同市场的每小时电价。列名已是默认名称，因此无需指定 id_col、time_col 或 target_col 参数。如果您的列名不同，请根据需要指定这些参数。

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
df.head()

	unique_id	ds	y
0	BE	2016-10-22 00:00:00	70.00
1	BE	2016-10-22 01:00:00	37.10
2	BE	2016-10-22 02:00:00	37.10
3	BE	2016-10-22 03:00:00	44.75
4	BE	2016-10-22 04:00:00	37.10

fcst = nixtla_client.forecast(df=df, h=24) # use id_col, time_col and target_col here if needed. 
fcst.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: h
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

	unique_id	ds	TimeGPT
0	BE	2016-12-31 00:00:00	45.190582
1	BE	2016-12-31 01:00:00	43.244987
2	BE	2016-12-31 02:00:00	41.958897
3	BE	2016-12-31 03:00:00	39.796680
4	BE	2016-12-31 04:00:00	39.204865

处理大量时间序列时，请考虑使用分布式计算框架以高效处理数据。TimeGPT 支持 Spark、Dask 和 Ray 等框架。

外生变量

TimeGPT 也接受外生变量。您可以通过在 y 列之后添加额外列来将外生变量添加到您的数据框中。

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()

	unique_id	ds	y	Exogenous1	Exogenous2	day_5
0	BE	2016-10-22 00:00:00	70.00	57253.0	49593.0	1.0
1	BE	2016-10-22 01:00:00	37.10	51887.0	46073.0	1.0
2	BE	2016-10-22 02:00:00	37.10	51896.0	44927.0	1.0
3	BE	2016-10-22 03:00:00	44.75	48428.0	44483.0	1.0
4	BE	2016-10-22 04:00:00	37.10	46721.0	44338.0	1.0

使用外生变量时，您还需要提供其未来值。

future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
future_ex_vars_df.head()

	unique_id	ds	Exogenous1	Exogenous2	day_5
0	BE	2016-12-31 00:00:00	70318.0	64108.0	1.0
1	BE	2016-12-31 01:00:00	67898.0	62492.0	1.0
2	BE	2016-12-31 02:00:00	68379.0	61571.0	1.0
3	BE	2016-12-31 03:00:00	64972.0	60381.0	1.0
4	BE	2016-12-31 04:00:00	62900.0	60298.0	1.0

fcst = nixtla_client.forecast(df=df, X_df=future_ex_vars_df, h=24)
fcst.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: h
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

	unique_id	ds	TimeGPT
0	BE	2016-12-31 00:00:00	51.632830
1	BE	2016-12-31 01:00:00	45.750877
2	BE	2016-12-31 02:00:00	39.650543
3	BE	2016-12-31 03:00:00	34.000072
4	BE	2016-12-31 04:00:00	33.785370

要了解如何将外生变量与 TimeGPT 结合使用，请参阅外生变量教程。

重要注意事项

使用 TimeGPT 时，数据不能包含缺失值。这意味着对于每个序列，时间戳不应有间隔，目标变量不应有缺失值。

更多信息，请参阅关于处理 TimeGPT 中缺失值的教程。

最低数据要求（适用于 AzureAI）

TimeGPT 目前支持任何数量的数据来生成点预测。也就是说，无论频率如何，每个序列调用 nixtla_client.forecast(df=df, h=h, freq=freq) 并期望获得结果所需的最小大小为一。

对于 Azure AI，在使用参数 level、finetune_steps、X_df（外生变量）或 add_history 时，API 根据频率要求最低数据点数量。以下是每种频率的最低大小：

频率	最低大小
每小时和亚小时（例如，“H”、“min”、“15T”）	1008
每日（“D”）	300
每周（例如，“W-MON”,..., “W-SUN”）	64
每月及其他频率（例如，“M”、“MS”、“Y”）	48

对于交叉验证，您需要考虑这些数量以及预测范围 (h)、窗口数量 (n_windows) 和窗口之间的间隔 (step_size)。因此，在这种情况下每个序列的最小观测数量将由以下关系确定：

先前描述的最低数量 + h + step_size + (n_windows - 1)

入门

功能

部署

教程

用例

API 参考

多序列

外生变量

重要注意事项

最低数据要求（适用于 AzureAI）

入门

功能

部署

教程

用例

API 参考

​多序列

​外生变量

​重要注意事项

​最低数据要求（适用于 AzureAI）

多序列

外生变量

重要注意事项

最低数据要求（适用于 AzureAI）