LlamaIndex
LlamaIndex's BaseCallbackHandler fires for every LLM call, embedding call, retriever query, and exception in a pipeline. AgentPing ships a native handler that listens to all of them; one line of setup covers every provider that LlamaIndex talks to.
Python
pip install 'agentping-sdk[llamaindex]' llama-index llama-index-llms-openai
import agentping
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.llms.openai import OpenAI
agentping.init()
Settings.callback_manager = CallbackManager([agentping.AgentPingLlamaIndexHandler()])
with agentping.run("docs-qa"):
docs = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(docs)
answer = index.as_query_engine().query("what changed last quarter?")
The handler attaches once at import time. Inside with agentping.run(...), every LLM call, embedding call, and retrieve step emits a corresponding event on the active run. Outside a run, the handler stays inert.
What it emits
| LlamaIndex event | AgentPing event | Data |
|---|---|---|
CBEventType.LLM |
llm_call |
provider, model, input/output tokens, latency |
CBEventType.EMBEDDING |
llm_call |
provider, model, kind: "embedding", input tokens |
CBEventType.RETRIEVE |
retrieve |
node count |
CBEventType.EXCEPTION |
error |
exception message (truncated to 500 chars) |
Provider is inferred from the model name: claude* → anthropic, gpt* / o1* / o3* / text-embedding-* → openai, gemini* → gemini, mistral* / mixtral* → mistral, command* / embed-* → cohere. Anything else lands as provider llamaindex and can be re-attributed via rate card overrides.
Attaching per-pipeline instead of globally
For finer control, scope the callback manager to a specific query engine rather than Settings.callback_manager:
from llama_index.core.callbacks import CallbackManager
handler = agentping.AgentPingLlamaIndexHandler()
with agentping.run("docs-qa"):
manager = CallbackManager([handler])
query_engine = index.as_query_engine(callback_manager=manager)
answer = query_engine.query("what changed?")
Workflows
For LlamaIndex Workflows, the same handler works. Attach it via Settings.callback_manager or pass it into the workflow's Context:
from llama_index.core.workflow import Context
ctx = Context(workflow, callback_manager=CallbackManager([agentping.AgentPingLlamaIndexHandler()]))
result = await workflow.run(ctx=ctx, input="process this")
Multi-step traces
A typical RAG query fires many events: embed → retrieve → LLM. Each lands as a discrete event on the run, so the dashboard shows the full pipeline:
run rag-pipeline
├── llm_call (text-embedding-3-small, embedding, 487 in)
├── retrieve (node_count=8)
└── llm_call (gpt-4o-mini, 1842 in / 213 out)
Cache attribution
When the underlying LLM reports cached prompt tokens (OpenAI's prompt_tokens_details.cached_tokens), the handler records cached_input_tokens alongside input_tokens. The rate card splits cached vs. uncached input at billing time.