GemmaKit

Documentation

A narrow runtime needs narrow docs. This page walks you through install, the supported subset of the Chat Completions API, the Swift client, and the licensing flow — and stops there.

Install

GemmaKit is currently distributed as a private source package and local server during the Pro build-out. Build the server from this repository and point your app or OpenAI-compatible client at the local endpoint.

cargo run --release -- serve \
  --model /path/to/gemma-4-e2b-it-text-only-4bit-overshow-runtime \
  --model-id gemma-4-e2b-it \
  --port 11436 \
  --api-key local

Run the server

The server binds to 127.0.0.1 by default. Pass a converted Gemma 4 text runtime path, a model id, and an optional local API key at start.

export GEMMAKIT_API_KEY="local"
cargo run --release -- serve \
  --model /path/to/gemma-4-e2b-it-text-only-4bit-overshow-runtime \
  --model-id gemma-4-e2b-it \
  --port 11436 \
  --api-key "$GEMMAKIT_API_KEY" \
  --cors-origin http://localhost:3000
Boundary. The runtime serves chat completions only. There is no /v1/responses, no /v1/embeddings, no /v1/images, and no /v1/audio. Calls to those paths return 404.

Chat completions

POST /v1/chat/completions openai-compatible · subset

Request body

fieldtypenotes
modelstringThe local model identifier loaded at server start.
messagesarrayRoles: system, developer, user, assistant. developer maps to the runtime system role. Content may be a string or text content parts.
streambooleanIf true, response is SSE with one delta per chunk.
stream_optionsobjectSupports include_usage. When true, streaming emits one final usage chunk before [DONE].
temperaturenumber0–2. Default 1.
top_pnumber0–1. Default 1.
max_tokensnumberCap on generated tokens.
stopstring · arrayUp to 4 stop sequences.

Streaming format

When stream: true, the server writes SSE chunks shaped like the OpenAI Chat Completions stream. The terminal chunk is a literal [DONE] sentinel. If stream_options.include_usage is true, GemmaKit emits one extra usage chunk before that sentinel.

data: {"choices":[{"delta":{"content":"Buffer"}}]}

data: {"choices":[{"delta":{"content":"ed"}}]}

data: {"choices":[{"delta":{"content":" responses"}}]}

data: [DONE]

Bearer auth

Authentication is handled by an optional local bearer token. Start the server with --api-key <token> or set GEMMAKIT_API_KEY, then pass the token from same-device clients in the Authorization header. The local API key is separate from the Pro organisation key.

CORS

CORS is configurable via --cors-origin or GEMMAKIT_CORS_ORIGIN. The local server defaults to * for browser-based local clients; pass --cors-origin none to disable cross-origin responses, or pass one explicit origin.

Swift client

The Swift client wraps the local server with async sequence APIs for streamed and buffered completions.

  • GemmaKitClientConstruct with base URL and local API key.
  • generate(...)Buffered helper. Returns the assistant text.
  • chatCompletion(...)Buffered. Returns the OpenAI-shaped response object.
  • stream(...)AsyncSequence of text deltas.
  • streamChunks(...)AsyncSequence of raw chunks, including usage when requested.

Errors

statuscodemeaning
401unauthorizedMissing or wrong bearer token.
402licence_inactivePro licence is expired, inactive, revoked, or outside entitlement.
404not_foundEndpoint outside the supported subset.
400unsupported_featureRequest field outside the supported Chat Completions subset.
500internal_errorUnexpected local runtime or server failure.

Licensing

Pro org keys mint signed local licence certificates with optional app-id binding. Activation, refresh, and gated generation activity update the active-device ledger for the billing period. Prompts, completions, local documents, model artefacts, and embeddings are not sent to the licence service.

See Licensing for the full flow.

Model files

GemmaKit does not auto-download models. Converted model artefacts are distributed under their own terms; place the gemma-4-e2b-it-text-only-4bit-overshow-runtime runtime folder where your app can read it and pass its path at server start.