PRICING

Stop paying for tokens you do not need.

Train a smaller open model on your workflow. When it matches or beats Claude Opus or GPT on that task, host your model and pay GPU-hours.

Start training

GPU HOURS

Our rates:

GPU$/hr$/sec
NVIDIA H200$4.55$0.001264
NVIDIA H100$3.24$0.000899
AMD MI300X$3.25$0.000903
NVIDIA A100 80GB$1.43$0.000397
NVIDIA A100 40GB$3.47$0.000964
NVIDIA L40S$1.03$0.000285
Start here

Launch

Start training specialized open models and pay only for the GPU time you use.

$0+ credits for compute
Get started
  • Credit top-ups from the dashboard
  • Training jobs & experiment traces
  • Provider GPU rates plus markup
  • Community support
Dedicated

Enterprise

For orgs that need private clusters, security review, and committed capacity.

Customannual contract
  • Reserved GPU pools
  • Private VPC deploys
  • Volume provider pricing
  • Solution engineering

WHY?

Save massive.

Let's say you make 1,000,000 monthly requests with 1,500 input and 500 output tokens each. That is 2.0B monthly tokens. A batched Qwen3-8B endpoint on 1x H100 uses about 122 GPU-hours, or $395/month at our current H100 rate.

Model replacedToken billVeri hostedAnnual savings
Claude Opus$20,000/mo$395/mo$235k/yr
GPT frontier model$22,500/mo$395/mo$265k/yr

WHY IT SAVES

You train for the narrow capability you need, then stop renting everything else.

Frontier models are priced for general intelligence across every domain. A fine-tuned Qwen-class model can be optimized for one workflow, deployed behind an OpenAI-compatible endpoint, and scaled with transparent GPU-hour economics.