在线（实时）异常检测简介

在本 notebook 中，我们将介绍 detect_anomalies_online 方法。您将学习如何快速开始使用这个新端点，并了解它与历史异常检测端点的主要区别。新功能包括： * 对异常检测过程更灵活的控制 * 执行单变量和多变量异常检测 * 检测流数据的异常

import pandas as pd
from nixtla import NixtlaClient
import matplotlib.pyplot as plt

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 使用 Azure AI 端点

要使用 Azure AI 端点，请设置 base_url 参数

nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")

1. 数据集

在本 notebook 中，我们使用了一个分钟级别的时间序列数据集，用于监控服务器使用情况。这是一个很好的流数据场景示例，因为任务是检测服务器故障或停机。

df = pd.read_csv('https://datasets-nixtla.s3.us-east-1.amazonaws.com/machine-1-1.csv', parse_dates=['ts'])

我们观察到，时间序列在初始阶段保持稳定；然而，在最后 20 步出现了一个峰值，表明存在异常行为。我们的目标是在这种异常跳跃出现时尽快捕获它。让我们看看 TimeGPT 的实时异常检测功能在此场景中的表现如何！

2. 实时检测异常

detect_anomalies_online 方法利用 TimeGPT 的预测能力来检测时间序列中的异常。它使用预测误差来判断异常步，因此您可以指定和调整参数，就像使用 forecast 方法一样。此函数将返回一个 dataframe，其中包含异常标志和异常分数（其绝对值量化了值的异常程度）。

要执行实时异常检测，请设置以下参数

df：一个包含时间序列数据的 pandas DataFrame。
time_col：标识日期戳的列。
target_col：要预测的变量。
h：预测视野，即向前预测的步数。
freq：时间序列的频率，使用 Pandas 格式。
level：设置阈值的分数分布百分位数，控制异常标记的严格程度。默认为 99%。
detection_size：在时间序列末尾分析异常的步数。

anomaly_online = nixtla_client.detect_anomalies_online(
    df,
    time_col='ts',                  
    target_col='y',                 
    freq='min',                     # Specify the frequency of the data
    h=10,                           # Specify the forecast horizon
    level=99,                       # Set the confidence level for anomaly detection
    detection_size=100              # How many steps you want for analyzing anomalies
)
anomaly_online.tail()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...

	unique_id	ts	y	TimeGPT	anomaly	anomaly_score	TimeGPT-hi-99	TimeGPT-lo-99
95	machine-1-1_y_29	2020-02-01 22:11:00	0.606017	0.544625	True	18.463266	0.553161	0.536090
96	machine-1-1_y_29	2020-02-01 22:12:00	0.044413	0.570869	True	-158.933850	0.579404	0.562333
97	machine-1-1_y_29	2020-02-01 22:13:00	0.038682	0.560303	True	-157.474880	0.568839	0.551767
98	machine-1-1_y_29	2020-02-01 22:14:00	0.024355	0.521797	True	-150.178240	0.530333	0.513261
99	machine-1-1_y_29	2020-02-01 22:15:00	0.044413	0.467860	True	-127.848560	0.476396	0.459325

📘 在此示例中，我们使用 100 的检测大小来演示异常检测过程。在实践中，使用较小的检测大小并更频繁地运行检测可以提高粒度，并能够更及时地识别发生的异常。

从图中可以看出，两个异常时期都在出现时立即被检测到。有关提高检测准确性和定制异常检测的更多方法，请阅读我们关于在线异常检测的其他教程。

有关 detect_anomalies_online 方法的深入分析，请参阅教程（即将发布）。

入门指南

功能

部署

教程

用例

API 参考

1. 数据集

2. 实时检测异常

入门指南

功能

部署

教程

用例

API 参考

​1. 数据集

​2. 实时检测异常

1. 数据集

2. 实时检测异常