Documentation

A narrow runtime needs narrow docs. This page walks you through install, the supported subset of the Chat Completions API, the Swift, TypeScript, Dart, Rust, and Cordova client paths, the licensing flow, and approved model acquisition.

Install

GemmaKit is currently distributed as a private source package and local server during the Pro build-out. Build the server from this repository and point your app or OpenAI-compatible client at the local endpoint.

cargo run --release -- serve \
  --model /path/to/gemma-4-e2b-it-text-only-4bit-overshow-runtime \
  --model-id gemma-4-e2b-it \
  --port 11436 \
  --api-key local

// Package.swift
.package(url: "https://github.com/leecrossley/GemmaKit", branch: "main")

npm install @gemmakit/client

flutter pub add gemmakit
# or: dart pub add gemmakit

cargo add gemmakit-client

cordova plugin add cordova-plugin-gemmakit

gemmakit activate \
  --org-key gk_live_org_... \
  --app-id com.customer.product \
  --licence-server-url https://gemmakit.app \
  --licence-public-key <ed25519-public-key>

Run the server

The server binds to 127.0.0.1 by default. Pass a converted Gemma 4 text runtime path, a model id, and an optional local API key at start.

export GEMMAKIT_API_KEY="local"
cargo run --release -- serve \
  --model /path/to/gemma-4-e2b-it-text-only-4bit-overshow-runtime \
  --model-id gemma-4-e2b-it \
  --port 11436 \
  --api-key "$GEMMAKIT_API_KEY" \
  --cors-origin http://localhost:3000

Boundary. The runtime serves chat completions only. There is no /v1/responses, no /v1/embeddings, no /v1/images, and no /v1/audio. Calls to those paths return 404.

Chat completions

POST /v1/chat/completions openai-compatible · subset

Request body

field	type	notes
model	string	The local model identifier loaded at server start.
messages	array	Roles: `system`, `developer`, `user`, `assistant`. `developer` maps to the runtime system role. Content may be a string or text content parts.
stream	boolean	If true, response is SSE with one `delta` per chunk.
stream_options	object	Supports `include_usage`. When true, streaming emits one final usage chunk before `[DONE]`.
temperature	number	0–2. Default 1.
top_p	number	0–1. Default 1.
max_tokens	number	Cap on generated tokens.
stop	string · array	Up to 4 stop sequences.

Streaming format

When stream: true, the server writes SSE chunks shaped like the OpenAI Chat Completions stream. The terminal chunk is a literal [DONE] sentinel. If stream_options.include_usage is true, GemmaKit emits one extra usage chunk before that sentinel.

data: {"choices":[{"delta":{"content":"Buffer"}}]}

data: {"choices":[{"delta":{"content":"ed"}}]}

data: {"choices":[{"delta":{"content":" responses"}}]}

data: [DONE]

Bearer auth

Authentication is handled by an optional local bearer token. Start the server with --api-key <token> or set GEMMAKIT_API_KEY, then pass the token from same-device clients in the Authorization header. The local API key is separate from the Pro organisation key.

CORS

CORS is configurable via --cors-origin or GEMMAKIT_CORS_ORIGIN. The local server defaults to * for browser-based local clients; pass --cors-origin none to disable cross-origin responses, or pass one explicit origin.

Framework clients

All client surfaces talk to the same local /v1/chat/completions endpoint. Swift, TypeScript, Dart, Rust, and Cordova clients are thin wrappers around the supported OpenAI-compatible subset; other frameworks can use direct HTTP or an OpenAI-compatible SDK pointed at http://127.0.0.1:11436/v1.

target	surface	notes
Swift and Apple apps	GemmaKitCore	Typed requests, responses, streaming helpers, and Pro-only server/model helpers when using the private package.
React Native	@gemmakit/client	Uses standard `fetch` and `ReadableStream`. The host app still starts or locates the local GemmaKit runtime.
Flutter	gemmakit	Dart client for buffered generation, streamed text deltas, full chunks, and OpenAI-shaped errors.
Electron	@gemmakit/client	Works from main or renderer processes, or through any OpenAI-compatible JavaScript SDK.
Ionic and Capacitor	HTTP · JS	Use the JavaScript client in webviews that provide `fetch` and streaming APIs. Set a narrow CORS origin for packaged apps.
Cordova	cordova-plugin-gemmakit	JavaScript-only plugin shell that exposes `@gemmakit/client` under `cordova.plugins.gemmakit`.
Tauri	gemmakit-client	Call from the webview with `@gemmakit/client` or from the Rust backend with the Rust crate.
Rust apps	gemmakit-client	Rust client for desktop CLIs and any Rust process that talks to a running `gemmakit serve`.
Node and CLIs	HTTP · JS	Use `@gemmakit/client`, the official OpenAI JavaScript SDK, or direct HTTP calls.

Swift client

The Swift client wraps the local server with async sequence APIs for streamed and buffered completions.

GemmaKitClientConstruct with base URL and local API key.
generate(...)Buffered helper. Returns the assistant text.
chatCompletion(...)Buffered. Returns the OpenAI-shaped response object.
stream(...)AsyncSequence of text deltas.
streamChunks(...)AsyncSequence of raw chunks, including usage when requested.

Errors

status	code	meaning
401	unauthorized	Missing or wrong bearer token.
402	licence_inactive	Pro licence is expired, inactive, revoked, or outside entitlement.
404	not_found	Endpoint outside the supported subset.
400	unsupported_feature	Request field outside the supported Chat Completions subset.
500	internal_error	Unexpected local runtime or server failure.

Licensing

Pro org keys mint signed local licence certificates with optional app-id binding. Activation, refresh, and gated generation activity update the active-device ledger for the billing period. Prompts, completions, local documents, model artefacts, and embeddings are not sent to the licence service.

See Licensing for the full flow.

Model files

GemmaKit can validate signed model manifests, discover installed model packs, install approved entitlement-gated downloads, and promote verified artefacts into the local model store. The inference server still starts only from a ready local model folder, so acquisition happens before gemmakit serve loads the model.