LlamaIndex

LlamaIndex's BaseCallbackHandler fires for every LLM call, embedding call, retriever query, and exception in a pipeline. AgentPing ships a native handler that listens to all of them; one line of setup covers every provider that LlamaIndex talks to.

Python

pip install 'agentping-sdk[llamaindex]' llama-index llama-index-llms-openai

import agentping
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.llms.openai import OpenAI

agentping.init()
Settings.callback_manager = CallbackManager([agentping.AgentPingLlamaIndexHandler()])

with agentping.run("docs-qa"):
    docs = SimpleDirectoryReader("./docs").load_data()
    index = VectorStoreIndex.from_documents(docs)
    answer = index.as_query_engine().query("what changed last quarter?")

The handler attaches once at import time. Inside with agentping.run(...), every LLM call, embedding call, and retrieve step emits a corresponding event on the active run. Outside a run, the handler stays inert.

What it emits

LlamaIndex event	AgentPing event	Data
`CBEventType.LLM`	`llm_call`	provider, model, input/output tokens, latency
`CBEventType.EMBEDDING`	`llm_call`	provider, model, `kind: "embedding"`, input tokens
`CBEventType.RETRIEVE`	`retrieve`	node count
`CBEventType.EXCEPTION`	`error`	exception message (truncated to 500 chars)

Provider is inferred from the model name: claude* → anthropic, gpt* / o1* / o3* / text-embedding-* → openai, gemini* → gemini, mistral* / mixtral* → mistral, command* / embed-* → cohere. Anything else lands as provider llamaindex and can be re-attributed via rate card overrides.

Attaching per-pipeline instead of globally

For finer control, scope the callback manager to a specific query engine rather than Settings.callback_manager:

from llama_index.core.callbacks import CallbackManager

handler = agentping.AgentPingLlamaIndexHandler()
with agentping.run("docs-qa"):
    manager = CallbackManager([handler])
    query_engine = index.as_query_engine(callback_manager=manager)
    answer = query_engine.query("what changed?")

Workflows

For LlamaIndex Workflows, the same handler works. Attach it via Settings.callback_manager or pass it into the workflow's Context:

from llama_index.core.workflow import Context

ctx = Context(workflow, callback_manager=CallbackManager([agentping.AgentPingLlamaIndexHandler()]))
result = await workflow.run(ctx=ctx, input="process this")

Multi-step traces

A typical RAG query fires many events: embed → retrieve → LLM. Each lands as a discrete event on the run, so the dashboard shows the full pipeline:

run rag-pipeline
├── llm_call (text-embedding-3-small, embedding, 487 in)
├── retrieve (node_count=8)
└── llm_call (gpt-4o-mini, 1842 in / 213 out)

Cache attribution

When the underlying LLM reports cached prompt tokens (OpenAI's prompt_tokens_details.cached_tokens), the handler records cached_input_tokens alongside input_tokens. The rate card splits cached vs. uncached input at billing time.