Public Preview · Machine Learning Edition

Productionise your model. Keep your earnings.

A cross-platform desktop app that turns a trained model into a paid API endpoint. No payment gateway integration, no business registration in compliant zones, no platform between you and your customers.

Download the preview ↓

Windows · Linux Stablecoin settlement (USDC / EURC / XLM)

Three things you control

The platforms you already know solve real problems. ApiCharge solves a different one: ownership.

— Own it

Your endpoint

Pick a serving framework — vLLM, ComfyUI, TorchServe, HuggingFace Diffusers, TensorFlow Serving — pick a GPU, deploy. Your model runs on infrastructure you chose, on a domain you control.

— Price it

Your pricing

Set per-call, per-second, per-byte, or credit-based tiers per service. Change pricing whenever you like — existing customers keep the QoS they paid for, cryptographically. No platform commission on your inference revenue.

— Get paid

Your wallet

Customers pay directly into a Stellar wallet you own in USDC, EURC, or XLM. Under MiCA in the EU and GENIUS in the US, this is infrastructure — not a payment processor. No merchant account, no gateway integration.

Watch a demo

A short walkthrough of deploying a model or LoRA, exposing inputs as a paid API, and getting paid — built for our first cohort of focused creators.

~7 min LoRA & model deployment Pricing & payout walkthrough

Model to endpoint in four steps

The desktop app handles the messy bits — provisioning, container deployment, SSH, certificates — so you stay in modeling mode.

// 01

Bring a model

Pull from HuggingFace or Civit.ai, or import local files. Metadata travels with the model.

// 02

Pick a GPU

Browse live Vast.ai spot offers — filter by VRAM, region, reliability, hourly price. Or bring your own VM.

// 03

Set the price

Per-call, per-second, per-byte, credit-based — multiple tiers per service. Configure once, change anytime.

// 04

Publish

One click. The container boots, gets a TLS cert, registers with the marketplace. Customers can pay.

Import models from HuggingFace Hub · Civit.ai · local filesystem

Frameworks supported out of the box

Pre-baked templates — no Dockerfile required. Bring your own works too.

vLLM ComfyUI TorchServe HuggingFace Diffusers TensorFlow Serving Custom Docker

Your model's inputs, exposed as a clean API

A trained model is only useful if customers can call it. ApiCharge maps the inputs that already exist in your ComfyUI workflow or HuggingFace Diffusers config straight to typed API parameters — pick what you want exposed, name it, and you're done. No request schema to hand-roll, no glue code to maintain.

Each exposed parameter can be priced independently — gate higher resolutions or longer prompts behind a premium tier, keep the basic call cheap.

comfyui: workflow nodes diffusers: config fields → typed API params

// parameter mapping

ComfyUI · KSampler.steps → api paramsteps : int

ComfyUI · CLIPText.text → api paramprompt : string

Diffusers · width / height → api paramresolution : enum

Diffusers · guidance_scale → api paramguidance : float

Pricing that's actually flexible

Define multiple tiers per service. Each tier carries its own rate-limiting strategy, cryptographically enforced at the proxy. Customers buy access tokens that lock in the QoS they paid for — your future pricing changes can never degrade their experience.

Per-call · per-second · per-byte · credit pricing
Multiple tiers per service — free preview to burst
Stablecoin settlement: USDC, EURC, XLM
Sub-cent transaction fees · settle in 2–3 seconds
Existing customers' QoS is locked — past promises honoured
Adjust pricing without redeploying the service

ApiCharge desktop app — pricing configuration showing per-service tiers, time-based and credit-based pricing, and live server cost from Vast.ai

Two services on one server, priced independently. Server cost $0.35/hr from Vast.ai shown alongside your tiers — total transparency, configured directly inside the app.

Scale-to-zero clusters with infer.host

When traffic comes, capacity comes. When it stops, costs stop. Run a single instance for a side-project, or autoscale across dozens of GPUs — same dialog.

Min instances

0 · scale-to-zero

Max instances

N · you choose

Idle timeout

10m default

GPU filter

model · VRAM · region

Idle

0 GPUs · $0/hr

// request →

Autoscale

control plane finds
cheapest matching offer

~2–5 min cold start

Live

N GPUs · per-worker TLS
direct client traffic

The control plane is metadata-only. Inference traffic goes directly to your worker — never through us. Snapshots flow Vast→R2→worker via presigned URLs we can't read.

An alternative path

Hosting platforms and model hubs have solved real problems for a lot of developers — discovery, onboarding, a pre-built audience. None of that is going away, and we're not asking you to leave.

But if you'd rather own the endpoint, set the price, and have customers pay you directly — without onboarding to a payment processor, registering a business in your jurisdiction, or letting a curation team decide what you're allowed to ship — ApiCharge is built for exactly that.

No platform commission taken from your inference revenue.
No takedowns from a curation team. Your endpoint, your rules.
QoS your customers paid for is cryptographically locked — pricing changes never break past commitments.
Models stay where you put them. Snapshots travel via presigned URLs the control plane can't read.

Get the preview

It's a beta — rough edges included — because we want feedback from people who'll actually use this. Log issues, file feature requests, tell us what's missing.

Windows x64 · Intel/AMD Windows ARM64 Linux x64 · AppImage Linux ARM64 · AppImage

Issues: github.com/StreamCharge/ApiCharge/issues