Swift and Apple apps
Use GemmaKitCore for typed requests, streamed chunks, and OpenAI-compatible response models. Use the Pro package when the app also manages the local server process and approved model acquisition.
GemmaKit turns Gemma 4 E2B into an Apple Silicon-optimised local text runtime for apps that need private inference without a cloud dependency. The 4-bit MLX repack is 2.63 GB on disk, runs 902.7 MiB leaner than the source checkpoint, and has passed a 500-generation structured-output validation run at 100% parseability. It drops behind an OpenAI-compatible Chat Completions API with Swift, TypeScript, Dart, Rust, Cordova, and webview integration paths.
No SDK rewrite for OpenAI-style clients, and thin GemmaKit clients where that makes app code cleaner. Streamed or buffered, with optional local bearer-token auth.
# Stream a chat completion against the local server curl http://127.0.0.1:11436/v1/chat/completions \ -H "Authorization: Bearer $GEMMAKIT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemma-4-e2b-it", "stream": true, "stream_options": { "include_usage": true }, "messages": [ { "role": "system", "content": "You are concise." }, { "role": "user", "content": "Summarise the changelog above." } ] }'
GemmaKit is the local server plus small client surfaces. Native, cross-platform, Rust-backed, and webview apps all talk to the same loopback Chat Completions endpoint.
Use GemmaKitCore for typed requests, streamed chunks, and OpenAI-compatible response models. Use the Pro package when the app also manages the local server process and approved model acquisition.
@gemmakit/client uses standard fetch and ReadableStream, so React Native apps can call the same local endpoint once the host app has started the GemmaKit runtime.
The Dart gemmakit package mirrors the Swift and TypeScript scope: buffered generation, streamed text deltas, full chunks, and OpenAI-shaped errors against the local server.
Electron main and renderer processes can use @gemmakit/client or any OpenAI-compatible JavaScript SDK pointed at 127.0.0.1.
Webview apps integrate over local HTTP with the JavaScript client when fetch and streams are available. Cordova projects can use cordova-plugin-gemmakit for a plugin install path.
Tauri apps can call GemmaKit from the webview through @gemmakit/client or from the Rust side through gemmakit-client. The server still stays on loopback by default.
Use @gemmakit/client, the official OpenAI JavaScript SDK, or direct HTTP calls for local developer tools, build assistants, and same-machine command-line workflows.
Python, Go, Kotlin, and other HTTP clients can use GemmaKit by changing only the base URL and staying inside the supported Chat Completions subset.
A 0.7B-parameter, 4-bit text runtime built for Apple Silicon apps: 2.63 GB on disk, 1,234 text tensors kept, 1,415 unused audio and vision tensors removed, and 100% parseability across a 500-generation structured-output validation run.
Click Run to preview the streaming shape. The response is illustrative; production tokens are generated by the local runtime on the device.
Prompts, completions, local documents, model artefacts, and embeddings are not sent to the licence service. The runtime binds to 127.0.0.1 by default, and only the licence channel reaches the network.
Six components. One binary. Swift, TypeScript, Dart, Rust, and Cordova clients. The rest is your app.
A converted Gemma text model packaged behind a local HTTP server. Binds to 127.0.0.1 by default and never opens external ports.
The Chat Completions endpoint, with the same JSON request and SSE response shape your existing client already speaks. No Responses API.
Small Swift, TypeScript, Dart, Rust, and Cordova surfaces for issuing chat completions, handling streamed deltas, and keeping app code typed without changing the local API contract.
Optional local bearer-token enforcement and configurable CORS for React Native, Electron, Ionic, Capacitor, Cordova, Tauri, and sibling local web tools.
Pro organisation keys, optional app-id binding, signed local licence certificates, and active-device reporting — without sending prompts or completions.
Text in, text out. No images, audio, embeddings, tool calls, retrieval, or stored completions — those are intentionally outside the boundary.