Open models inference

We run the best open coding models on our own GPUs, and we host the cloud agents that use them. Plug us into the harness you already use, or open a wired-up cloud agent from any device. Open stack. No lock-in.

Use it with the tools you already use

OpenCodeOpenCodeClaude CodeClaude CodeπPiZedZedCursorCursorCopilotCopilotand more
Your tools
Any harness, any device
Umans inference
Open models
Our GPUs
Cloud agent
live
Always-on. Reach from anywhere.

Why teams choose Umans

Infrastructure we own

We run open inference on our own GPUs and host the cloud agents that use them. Mostly in Europe, with US capacity for US customers.

Open models, by design

Open weights you can audit. The model that wrote your code is the same one anyone can inspect and run elsewhere.

Open stack, no lock-in

Plug our endpoint into the harness you already use with BYOK. Mix our inference with proprietary models. Walk away whenever.

Reach your agent from anywhere

Cloud agents are always-on and reachable from your laptop or phone. Same session, any device.

How it works

Two ways to use Umans

Local

Use Umans from your tool

Three ways to plug in. Pick the one that fits your harness.

Pick Umans in your tool

We're on models.dev. OpenCode, Zed and other modern tools list Umans as a provider out of the box.

Provider
Umans
OpenCodeZedCursorand more
Runumans claude

Install the umans CLI, then `umans claude` powers Claude Code with our inference. Switch back to `claude` anytime.

$ umans claude
Claude Code, powered by Umans.
Set base URL and key

Two env vars in any OpenAI- or Anthropic-compatible tool.

BASE_URL=https://api.code.umans.ai/v1
API_KEY=umans_••••••••••••
Remote

Run your agent in our cloud

One click. The agent runs on our infra, already wired up to our inference.

  • Close your laptop

    The agent keeps working. Come back later, the session is still there.

  • Open it from your phone

    Same session, any device. Walk in and out of the work.

  • Sandboxed in its own box

    The agent does its thing on isolated infra. Your local setup stays clean.

Pricing at a glance

Popular
Pro
For individuals
Hands-on coding
  • 200 effective requests per rolling 5h window
  • Up to 3 concurrent requests
  • Full open model lineup
Get Pro
Max
For power users
Long sessions, heavy use
  • Unlimited tokens
  • Up to 4 concurrent sessions
  • Full open model lineup
Get Max
Orgs
For teams and automations
Service keys + role management
  • Pay per token via service keys
  • Centralized usage and billing
  • Role management for seats
Contact us

Plans for individuals. Service keys for automations.

Early release

Cloud agents

Spin up a coding agent on our infra in one click. It lands wired up to our inference, with the umans CLI installed and configured. Reach it from your laptop or phone. Same session, any device.

What you can do with one

Refactor while you sleep

Set the goal, close your laptop. The agent keeps working through the night.

Triage from anywhere

Open the agent on your phone. Walk through issues, ship the small fix.

Sandbox bold ideas

Spin up a fresh box, try a wild change, throw it away. Your local stays clean.