Imbila.AI API — Shared AI Services Gateway

Protocol Stack

How the layers fit together

Requests flow from any client through auth and metering, to the service layer, and out to LLM inference. Every call is tracked, costed, and auditable.

Clients

Calling Applications & Agents

Any HTTP client, MCP agent, or A2A peer with an API key

demo.imbila.ai claude.imbila.ai baobab.imbila.ai MCP Agent A2A Peer External App

↓

Gateway

API Key Auth + Rate Limiting + Metering

Every request authenticated, rate-limited, and tagged with client_id for billing

Bearer Auth Rate Limit Usage Log Budget Check

↓

Services

Shared Endpoints (MCP Tools)

Each service exposed as both REST and MCP tool, with consistent metering

/v1/completions /v1/knowledge/query /v1/documents/summarise /v1/notify /v1/usage/report

↓

Inference

Model Router → Workers AI

Automatic model selection by capability or explicit choice. Cost-optimised routing.

Llama 3.1 8B Llama 3.3 70B Mistral 7B Gemma 7B Qwen 1.5 14B

↓

Observability

Usage Tracking & Billing (D1)

Every call logged with tokens, cost, latency, client. Queryable via /v1/usage/report

Per-client spend Per-model cost Per-endpoint volume Audit trail

API Reference

Five services, one gateway

Each endpoint is also discoverable as an MCP tool via /mcp/manifest.

POST

/v1/completions

LLM completion with model switching. Request by model name or capability (fast, smart, cheap). Returns response with token count and ZAR cost.

Compute

POST

/v1/knowledge/query

Search the knowledge base for articles, explainers, and technical content. Ranked results with relevance scoring.

Knowledge

POST

/v1/documents/summarise

Summarise text or fetch a URL and summarise its content. Supports bullet, paragraph, and structured output.

Compute

POST

/v1/notify

Send email notification via Resend. Templates for transactional, alert, and digest formats.

Integration

GET

/v1/usage/report

Query usage stats grouped by client, model, endpoint, or day. Token counts and ZAR costs.

Platform

Try It

API Playground

Send a real request and see the response with full usage metering.

API Key

Model Capability (if no model selected) System Prompt User Message

// Response will appear here.
// Select an endpoint, fill in parameters, and click Send Request.

// Example curl:
curl -X POST https://api.imbila.ai/v1/completions \
  -H "Authorization: Bearer sk-imbila-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Live Monitoring

Usage Dashboard

Real-time view of API activity, cost, and performance across all clients.

Total Requests

all time

Total Tokens

in + out

Total Spend

ZAR

Avg Latency

Recent API Calls

Time	Client	Endpoint	Model	Tokens	Cost	Latency	Status
Enter your API key and click refresh to load live data, or send a request from the playground.

Model Registry

Available models & pricing

Request by model ID or by capability. The gateway picks the cheapest model matching your capability requirement.

Model	Provider	Cost/1K In (ZAR)	Cost/1K Out (ZAR)	Max Tokens	Capabilities
llama-3.1-8b	Meta	R0.04	R0.06	2,048	chatcodesummarisefastcheap
llama-3.3-70b	Meta	R0.16	R0.22	4,096	chatcodesummarisesmart
mistral-7b	Mistral	R0.04	R0.06	2,048	chatcodefastcheap
gemma-7b	Google	R0.04	R0.06	2,048	chatsummarisefastcheap
qwen-1.5-14b	Alibaba	R0.08	R0.11	2,048	chatcodesmart

Agent Discovery

MCP + A2A Integration

Any MCP-compatible agent can discover these services automatically.

// MCP manifest endpoint
GET https://api.imbila.ai/mcp/manifest

// Returns tool definitions that any MCP client can consume:
{
  "name": "imbila-api",
  "tools": [
    { "name": "completions", "description": "LLM completion with model switching..." },
    { "name": "knowledge_query", "description": "Search knowledge base..." },
    { "name": "summarise", "description": "Summarise text or URL..." },
    { "name": "notify", "description": "Send email notification..." },
    { "name": "usage_report", "description": "Query usage statistics..." }
  ],
  "auth": { "type": "bearer" }
}

// A2A Agent Card (for agent-to-agent discovery)
{
  "name": "Imbila API Gateway",
  "url": "https://api.imbila.ai",
  "capabilities": ["completions", "knowledge", "summarise", "notify"],
  "protocols": ["MCP", "A2A", "REST"]
}

// Example: Agent calls completions via MCP
const result = await mcpClient.callTool("completions", {
  messages: [{ role: "user", content: "Explain MCP in one paragraph" }],
  capability: "smart"
});

// The gateway:
// 1. Authenticates via API key
// 2. Resolves "smart" → llama-3.3-70b
// 3. Runs inference on Workers AI
// 4. Logs: client_id, model, tokens, cost, latency
// 5. Returns response with usage metadata