Shared AI Services Gateway

One API for LLM completions, knowledge retrieval, document processing, and notifications. Multi-client billing, model switching, and MCP-compatible discovery — built for agents.

MCP Compatible A2A Ready Workers AI Multi-Model Usage Metered ZAR Billing

How the layers fit together

Requests flow from any client through auth and metering, to the service layer, and out to LLM inference. Every call is tracked, costed, and auditable.

Clients
Calling Applications & Agents
Any HTTP client, MCP agent, or A2A peer with an API key
demo.imbila.ai claude.imbila.ai baobab.imbila.ai MCP Agent A2A Peer External App
Gateway
API Key Auth + Rate Limiting + Metering
Every request authenticated, rate-limited, and tagged with client_id for billing
Bearer Auth Rate Limit Usage Log Budget Check
Services
Shared Endpoints (MCP Tools)
Each service exposed as both REST and MCP tool, with consistent metering
/v1/completions /v1/knowledge/query /v1/documents/summarise /v1/notify /v1/usage/report
Inference
Model Router → Workers AI
Automatic model selection by capability or explicit choice. Cost-optimised routing.
Llama 3.1 8B Llama 3.3 70B Mistral 7B Gemma 7B Qwen 1.5 14B
Observability
Usage Tracking & Billing (D1)
Every call logged with tokens, cost, latency, client. Queryable via /v1/usage/report
Per-client spend Per-model cost Per-endpoint volume Audit trail

Five services, one gateway

Each endpoint is also discoverable as an MCP tool via /mcp/manifest.

POST
/v1/completions
LLM completion with model switching. Request by model name or capability (fast, smart, cheap). Returns response with token count and ZAR cost.
Compute
POST
/v1/knowledge/query
Search the knowledge base for articles, explainers, and technical content. Ranked results with relevance scoring.
Knowledge
POST
/v1/documents/summarise
Summarise text or fetch a URL and summarise its content. Supports bullet, paragraph, and structured output.
Compute
POST
/v1/notify
Send email notification via Resend. Templates for transactional, alert, and digest formats.
Integration
GET
/v1/usage/report
Query usage stats grouped by client, model, endpoint, or day. Token counts and ZAR costs.
Platform

API Playground

Send a real request and see the response with full usage metering.

// Response will appear here.
// Select an endpoint, fill in parameters, and click Send Request.

// Example curl:
curl -X POST https://api.imbila.ai/v1/completions \
  -H "Authorization: Bearer sk-imbila-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Usage Dashboard

Real-time view of API activity, cost, and performance across all clients.

Total Requests
--
all time
Total Tokens
--
in + out
Total Spend
--
ZAR
Avg Latency
--
ms

Recent API Calls

Time Client Endpoint Model Tokens Cost Latency Status
Enter your API key and click refresh to load live data, or send a request from the playground.

Available models & pricing

Request by model ID or by capability. The gateway picks the cheapest model matching your capability requirement.

Model Provider Cost/1K In (ZAR) Cost/1K Out (ZAR) Max Tokens Capabilities
llama-3.1-8b Meta R0.04 R0.06 2,048 chatcodesummarisefastcheap
llama-3.3-70b Meta R0.16 R0.22 4,096 chatcodesummarisesmart
mistral-7b Mistral R0.04 R0.06 2,048 chatcodefastcheap
gemma-7b Google R0.04 R0.06 2,048 chatsummarisefastcheap
qwen-1.5-14b Alibaba R0.08 R0.11 2,048 chatcodesmart

MCP + A2A Integration

Any MCP-compatible agent can discover these services automatically.

// MCP manifest endpoint GET https://api.imbila.ai/mcp/manifest // Returns tool definitions that any MCP client can consume: { "name": "imbila-api", "tools": [ { "name": "completions", "description": "LLM completion with model switching..." }, { "name": "knowledge_query", "description": "Search knowledge base..." }, { "name": "summarise", "description": "Summarise text or URL..." }, { "name": "notify", "description": "Send email notification..." }, { "name": "usage_report", "description": "Query usage statistics..." } ], "auth": { "type": "bearer" } } // A2A Agent Card (for agent-to-agent discovery) { "name": "Imbila API Gateway", "url": "https://api.imbila.ai", "capabilities": ["completions", "knowledge", "summarise", "notify"], "protocols": ["MCP", "A2A", "REST"] }
// Example: Agent calls completions via MCP const result = await mcpClient.callTool("completions", { messages: [{ role: "user", content: "Explain MCP in one paragraph" }], capability: "smart" }); // The gateway: // 1. Authenticates via API key // 2. Resolves "smart" → llama-3.3-70b // 3. Runs inference on Workers AI // 4. Logs: client_id, model, tokens, cost, latency // 5. Returns response with usage metadata