OpenAI
Auto-instrument the OpenAI SDK. Every chat.completions.create call inside an active run emits an llm_call event with model, token usage, and latency. Cost is computed server-side from the rate card.
Install / enable
Python:
pip install agentping-sdk openai
import agentping
from openai import OpenAI
agentping.init()
agentping.instrument_openai()
client = OpenAI()
with agentping.run("support-triage", customer_id="acme-corp") as r:
reply = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "classify this ticket"}],
)
r.score("confidence", 0.88)
instrument_openai() patches both openai.OpenAI and openai.AsyncOpenAI. Idempotent; safe to call once at module load. The same wrapper covers Azure OpenAI (AzureOpenAI); the event's provider is openai and model carries your deployment name.
TypeScript:
npm install @agentping/sdk openai
import OpenAI from "openai";
import * as agentping from "@agentping/sdk";
agentping.init({ apiKey: process.env.AGENTPING_API_KEY });
const run = agentping.run("answer-bot");
const openai = agentping.instrumentOpenAI(new OpenAI(), { run });
const reply = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "hi" }],
});
await run.finish({ status: "success" });
instrumentOpenAI returns a wrapped client whose return types match the original; reply.choices[0].message.content works as before.
What's captured
Model, input tokens, output tokens, and latency on every chat.completions.create. Prompt-cache hits (prompt_tokens_details.cached_tokens) are split out into cached_input_tokens so they price at the lower rate. Function-call exchanges stay a single llm_call with the final usage; record the tool selection separately for clearer traces:
r.event("tool_call", {
"tool": reply.choices[0].message.tool_calls[0].function.name,
"args": reply.choices[0].message.tool_calls[0].function.arguments,
})
The Responses API (responses.create) is not auto-instrumented. Wrap it in a run and emit the event yourself from the returned usage block.
Streaming
TypeScript: automatic. Pass stream: true and the wrapper accumulates token counts across chunks, emitting one llm_call when the stream drains.
Python: streaming (chat.completions.create(stream=True)) is not auto-captured. Drain the stream with stream_options={"include_usage": True}, then emit one event:
with agentping.run("answer-bot") as r:
started = time.perf_counter()
usage = None
stream = client.chat.completions.create(
model="gpt-4o-mini",
stream=True,
stream_options={"include_usage": True},
messages=[{"role": "user", "content": "explain briefly"}],
)
for chunk in stream:
if chunk.usage:
usage = chunk.usage
r.event("llm_call", {
"provider": "openai",
"model": "gpt-4o-mini",
"input_tokens": usage.prompt_tokens,
"output_tokens": usage.completion_tokens,
"latency_ms": int((time.perf_counter() - started) * 1000),
})
Source / notes
- Python:
agentping.instrument_openai()in agentping-sdk-python - TypeScript:
instrumentOpenAIin agentping-sdk-typescript
The rate card carries the current GPT model family; unknown models still land, with cost shown as null in the unpriced models report.