OpenAI

Auto-instrument the OpenAI SDK. Every chat.completions.create call inside an active run emits an llm_call event with model, token usage, and latency. Cost is computed server-side from the rate card.

Install / enable

Python:

pip install "agentping-sdk @ git+https://github.com/agent-ping/agent-ping-python.git" openai

import agentping
from openai import OpenAI

agentping.init()
agentping.instrument_openai()

client = OpenAI()

with agentping.run("support-triage", customer_id="acme-corp") as r:
    reply = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "classify this ticket"}],
    )
    r.score("confidence", 0.88)

instrument_openai() patches both openai.OpenAI and openai.AsyncOpenAI. Idempotent; safe to call once at module load. The same wrapper covers Azure OpenAI (AzureOpenAI); the event's provider is openai and model carries your deployment name.

TypeScript:

npm install github:agent-ping/agent-ping-typescript openai

import OpenAI from "openai";
import * as agentping from "@agentping/sdk";

agentping.init({ apiKey: process.env.AGENTPING_API_KEY });

const run = agentping.run("answer-bot");
const openai = agentping.instrumentOpenAI(new OpenAI(), { run });

const reply = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "hi" }],
});

await run.finish({ status: "success" });

instrumentOpenAI returns a wrapped client whose return types match the original; reply.choices[0].message.content works as before.

What's captured

Model, input tokens, output tokens, and latency on every chat.completions.create. Prompt-cache hits (prompt_tokens_details.cached_tokens) are split out into cached_input_tokens so they price at the lower rate. Function-call exchanges stay a single llm_call with the final usage; record the tool selection separately for clearer traces:

r.event("tool_call", {
    "tool": reply.choices[0].message.tool_calls[0].function.name,
    "args": reply.choices[0].message.tool_calls[0].function.arguments,
})

The Responses API (responses.create) is not auto-instrumented. Wrap it in a run and emit the event yourself from the returned usage block.

Streaming

TypeScript: automatic. Pass stream: true and the wrapper accumulates token counts across chunks, emitting one llm_call when the stream drains.

Python: streaming (chat.completions.create(stream=True)) is not auto-captured. Drain the stream with stream_options={"include_usage": True}, then emit one event:

with agentping.run("answer-bot") as r:
    started = time.perf_counter()
    usage = None
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        stream=True,
        stream_options={"include_usage": True},
        messages=[{"role": "user", "content": "explain briefly"}],
    )
    for chunk in stream:
        if chunk.usage:
            usage = chunk.usage

    r.event("llm_call", {
        "provider": "openai",
        "model": "gpt-4o-mini",
        "input_tokens": usage.prompt_tokens,
        "output_tokens": usage.completion_tokens,
        "latency_ms": int((time.perf_counter() - started) * 1000),
    })

Source / notes

Python: agentping.instrument_openai() in agentping-sdk-python
TypeScript: instrumentOpenAI in agentping-sdk-typescript

The rate card carries the current GPT model family; unknown models still land, with cost shown as null in the unpriced models report.