AI inference

Concept

Apps that use AI features call Greenlight’s inference gateway under /ai/*, not a model vendor directly. The platform routes the call to the configured provider, applies the org’s model aliases, enforces budget controls, and writes every prompt and response to the audit log.

This is the same pattern as the data broker, applied to model APIs: the app never holds a model vendor’s API key, and IT controls which models are available and how they’re billed.

Why a gateway

Model APIs have the same problems as any other upstream credential: the keys end up in code, the bills become opaque, and the audit trail lives in whatever the vendor’s console gives you. A gateway puts all four problems behind one surface — the app doesn’t see the key, IT sees one bill, the audit trail is in the same audit log as everything else, and the org can swap providers without app changes.

The gateway also makes it possible to enforce things vendor APIs don’t: per-app rate limits, per-model budgets, prompt-pattern policies, and a kill switch for runaway loops.

Model aliases

Apps don’t call models by vendor name. They call models by alias — a logical name the organization defines and binds to a specific provider/model combination.

Alias	Bound to (example)
`summarize-fast`	A fast, low-cost chat model
`summarize-deep`	A high-quality reasoning model
`embedding-default`	An embedding model
`classifier`	A small classification model

IT defines the aliases and binds each to a specific provider and model in the dashboard; the names and bindings above are only examples. Changing a binding doesn’t change a line of app code — re-pointing an alias from one provider’s model to another’s is a single dashboard update, and apps consuming it pick up the new model on their next request.

Providers

Greenlight’s inference gateway routes to whichever providers IT enables for the organization. Common providers:

Azure AI Foundry (the default target for Azure installs)
OpenAI direct
Anthropic direct
Amazon Bedrock
On-prem LiteLLM or vLLM endpoints

Each provider’s credential lives in Key Vault. The gateway swaps the credential in at request time, just like the data broker does for HTTP integrations.

What an app sends

The wire format is OpenAI-compatible. Apps that already speak the OpenAI API change one base URL and one header:

const response = await fetch(
  `${process.env.GREENLIGHT_AI_BASE_URL}/v1/chat/completions`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.GREENLIGHT_AI_KEY}`,
    },
    body: JSON.stringify({
      model: 'summarize-fast',
      messages: [{ role: 'user', content: 'Summarize this in two sentences.' }],
    }),
  }
);

No vendor API key and no vendor SDK — GREENLIGHT_AI_KEY is a Greenlight-issued key scoped to your app (not the model vendor’s), and the model identifier is the alias.

Budgets and rate limits

Per-org and per-app budgets are configured in the dashboard. The gateway tracks spend in real time and rejects calls that would exceed the cap with a structured error the app can handle. Per-model rate limits prevent a single misbehaving loop from exhausting the org’s monthly quota.

A runaway-loop kill switch — automatic suspension of an app that issues a configurable threshold of calls in a short window — is available as a policy option.

Data brokering The same pattern, applied to HTTP integrations.

Governance & policy Where budget and rate-limit policies live.

MCP tools reference The full set of tools the agent uses with the gateway.