Simple, Transparent Pricing

Pay only for what you use. No hidden fees. No egress charges. Cancel anytime.

Up to 60% cheaper

than RunPod on the same GPU models

Multi-Region Infrastructure

GPUs in APAC and beyond. Low-latency for developers in India, SEA, LatAm, and Europe.

60-second deploy

Browser-based. No SSH, no setup.

GPU ModelStockCloudGPUVast.aiRunPod
RTX 3090 24GSupplied via P2P agentsLow stock$0.20/hr$0.22/hr$0.22/hr
RTX 4090 24GLow stock$0.32/hr$0.32/hr$0.34/hr
RTX 4090D 48GChina-spec 4090, 48GB VRAMLow stock$0.59/hrN/AN/A
5090 32GLow stock$0.49/hrN/AN/A
A100 40GLow stock$0.89/hr$0.80/hr$1.89/hr
L20 48GChina-spec Ada, big VRAMLow stock$0.89/hrN/A$1.20/hr
L40 48GLow stock$0.79/hrN/A$1.19/hr
L40S 48GLow stock$1.19/hr$1.65/hr$1.89/hr
H20 96GChina-spec Hopper, huge HBM3Low stock$1.59/hrN/AN/A
Ascend 910B 64GNPU — CANN / MindSpore / PyTorch-NPU onlyLow stock$0.85/hrN/AN/A
A800 80GChina-spec A100Low stock$1.39/hrN/A$2.17/hr
H800 80GChina-spec H100, 400Gb/s NVLink cappedLow stock$3.49/hrN/A$3.99/hr

Vast.ai / RunPod rates observed April 2026 for equivalent SKU. "N/A" = competitor does not offer that GPU in their catalog.

Performance & Workload Fit

Price alone doesn't tell you which GPU is right. L20, H20, and Ascend 910B are China-spec cards that US clouds don't sell — they have unique VRAM / bandwidth profiles worth comparing.

GPUArchVRAMFP16 TFLOPSMem BWBest For
RTX 3090 24G
Supplied via P2P agents
Ampere24 GB36936 GB/sLLM inference ≤ 7B, Stable Diffusion, cheap dev box
RTX 4090 24G
Ada24 GB731008 GB/sFine-tuning 7B-13B, Flux / SDXL, fast inference
RTX 4090D 48G
China-spec 4090, 48GB VRAM
Ada48 GB731008 GB/sFine-tuning 13B-30B at higher batch sizes, 70B quantized inference
5090 32G
Blackwell32 GB1041792 GB/sLatest consumer flagship — Blackwell features, SD/Flux speed
A100 40G
Ampere40 GB3121555 GB/sLLM training / fine-tune 13B-30B, research
L20 48G
China-spec Ada, big VRAM
Ada48 GB119864 GB/s32B inference, multi-model serving, VRAM-heavy workloads
L40 48G
Ada48 GB90864 GB/s32B inference, visualization + compute combined
L40S 48G
Ada48 GB183864 GB/sHigh-end inference + light training, better than L40
H20 96G
China-spec Hopper, huge HBM3
Hopper96 GB1484000 GB/s70B-120B inference, long-context, MoE models
Ascend 910B 64G
NPU — CANN / MindSpore / PyTorch-NPU only
Ascend64 GB3201200 GB/sDomestic-compliant training, MindSpore/PyTorch-NPU workloads
A800 80G
China-spec A100
Ampere80 GB3122039 GB/sFP32/FP16 training 30B-70B, large-memory research
H800 80G
China-spec H100, 400Gb/s NVLink capped
Hopper80 GB9893350 GB/sFrontier training, 70B+ fine-tuning, FP8 inference

VRAM determines model size

24 GB fits 7B at FP16 / 30B quantized. 48 GB fits 13B at FP16 / 70B quantized. 96 GB comfortably runs 70B FP16.

Mem bandwidth = tokens/sec

For LLM inference, throughput scales with memory bandwidth, not TFLOPS. H20's 4 TB/s beats A100 for serving large models.

FP16 TFLOPS = training speed

A100's 312 TFLOPS shines for fine-tuning. 910B's 320 is on paper — actual speed depends on CANN/MindSpore ecosystem maturity.

Cost Calculator

1x
8h
7 days

1x RTX 4090 24G · 56 total hours

$17.92

Save $1.12 vs RunPod

6% cheaper than RunPod

Rent Now

Pricing Plans

On-Demand

Pay as you go. No commitment.

Market Rate

  • check Per-hour billing
  • check Cancel anytime
  • check No minimum spend
  • check All GPU models
Get Started
Best Value

Monthly

Commit monthly, save 30%.

30% Off

on hourly rates

  • check Everything in On-Demand
  • check 30% discount on all GPUs
  • check Priority provisioning
  • check Email support
Subscribe

Prepaid Credits

Buy credits in bulk, get bonus.

$100 → $110

10% bonus credits

  • check 10% bonus on all top-ups
  • check Credits never expire
  • check Use on any GPU model
  • check Transferable balance
Buy Credits

No hidden fees · No egress charges · Per-hour billing · Cancel anytime