The in-browser AI stack

Create web apps that run AI directly in your users' browsers.

Increase user privacy and decrease your inference costs

Why Offload?

Logos from different open-source AI large language models (LLMs) and Small Language Models (SMLs)

Many people are concerned about data privacy when using AI features, as these typically send their data to third-party inference APIs.

With the Offload SDK, your users can opt-in for local AI execution, without any extra effort on your part.

This increases user privacy and also reduces your infrastructure and inference costs, since a significant amount of computation happens directly on the user device.

If you build AI applications or agents for healthcare, legal, finance, document processing, or any field that processes sensitive user information, Offload is for you.

Features

Offload SDK supported and planned features

  • Text generation
  • Text streaming
  • Structured object generation
  • Automatic GPU detection and API fallback
  • Dynamic model serving depending on device resources
  • Prompt customization per model
  • Prompt version control
  • In-browser RAG pipeline
  • Custom fine-tunned model support
  • Advanced Analytics
supported
in Development
planned

The Offload widget

When you integrate Offload, our widget automatically appears to the users whose device has enough resources to perform inference locally.

Easy to add to any project

Offload replaces any SDK you are currently using - just change the inference calls.
AI tasks are processed on the user"s device when possible, with automatic fallback to any API you configure in the dashboard.

How to install

<!-- Include the Offload library on your app -->
<script src="//unpkg.com/offload-ai" defer></script>

Simply add the library either from CDN script or importing from npm.

How to run inference

// Configure offload instance, just once in your app
Offload.config({
    appUuid: "your-app-uuid-from-dashboard",
    promptUuids: {
        user_text: "your-prompt-uuid-from-dashboard"
    }
});

// Run inference. You can use streams, force JSON output, etc.
const { text } = await Offload.offload({
    promptKey: "user_text",
});

And you are done!

Frequently Asked Questions

FAQ

Start offloading right now!

Get Started for free!