
Trusted by ML research teams and developers worldwide
Platform Capabilities
NVIDIA Cloud Partner
NVIDIA H200, H100, A100 GPUs alongside high core count CPUs. NVLink and InfiniBand for fast networking.
Data Centers in Canada and US
Sovereign compute for lower latency and data locality requirements.
Engineer-Level Support
Support requests are handled directly by engineers familiar with AI training, cluster issues, and performance tuning.
Long-Term Savings
Lower your costs by up to 30% when you commit for several months.
Integrated Storage
Local NVMe and scalable network storage built into every deployment.
SOC2 Compliant
Infrastructure aligned with enterprise security and compliance requirements.
Prices
Per-Minute Billing + Reserved Pricing. Scale up or down instantly with on-demand instances billed by the minute, or lock in lower rates with reserved pricing.
Price
Interconnect
GPU VRAM
Local Storage
Memory
vCPUs
Type
Get Access
Prices
Per-Minute Billing + Reserved Pricing. Scale up or down instantly with on-demand instances billed by the minute, or lock in lower rates with reserved pricing.
NVIDIA H200 SXM
Reserved only
208
vCPUs
2048 GB
Memory
6x 3.8TB NVMe
Local Storage
141 GB
GPU VRAM
RoCE 3200G
Interconnect
NVIDIA H100 SXM
$2.10 / GPU
208
vCPUs
1024 GB
Memory
6x 3.8TB NVMe
Local Storage
80 GB
GPU VRAM
IB 3200G
Interconnect
NVIDIA A40
$0.65 / GPU
128
vCPUs
512 GB
Memory
2x 3.8TB NVMe
Local Storage
48 GB
GPU VRAM
-
Interconnect
NVIDIA A100 SXM
$1.30 / GPU
208
vCPUs
1024 GB
Memory
6x 3.8TB NVMe
Local Storage
80 GB
GPU VRAM
IB 1600G
Interconnect
NVIDIA A100 SXM
$1.15 / GPU
128
vCPUs
1024 GB
Memory
4x 3.8TB NVMe
Local Storage
40 GB
GPU VRAM
IB 800G
Interconnect
NVIDIA A100 PCIe
$1.15 / GPU
64
vCPUs
512 GB
Memory
2x 3.8TB NVMe
Local Storage
40 GB
GPU VRAM
-
Interconnect
NVIDIA A100 MIG
$0.58 / GPU
5
vCPUs
55 GB
Memory
-
Local Storage
20 GB
GPU VRAM
-
Interconnect
All GPUs are available as VM and Bare Metal
Platform
GPUs
GPU VRAM
vCPUs
Memory
Local Storage
Interconnect
On-Demand
Configurations
Per-minute billing with on-demand and reserved options. All configurations available as bare metal, VM, or model endpoints.
Related GPUs
Compare Denvr GPU options by workload and performance requirements.
Optimized For
Distributed training, multi-node scaling
Large model training, high-throughput inference
Extended context, large batch inference
VRAM
80 GB
80 GB
141 GB
Memory Bandwidth
2,039 GB/s
3,350 GB/s
4,800 GB/s
FP64/FP32
19.5 TFLOPS
67 TFLOPS
67 TFLOPS
FP16
312 TFLOPS
1,979 TFLOPS
1,979 TFLOPS
FP8
-
3,958 TFLOPS
3,958 TFLOPS
NVLink
600 GB/s
900 GB/s
900 GB/s
Prices
Per-Minute Billing + Reserved Pricing. Scale up or down instantly with on-demand instances billed by the minute, or lock in lower rates with reserved pricing.
Type | vCPUs | Memory | Local Storage | GPU VRAM | Price | Interconnect |
|---|---|---|---|---|---|---|
NVIDIA H200 SXM | 208 | 2048 GB | 6x 3.8TB NVMe | 141 GB | Reserved only | RoCE 3200G |
NVIDIA H100 SXM | 208 | 1024 GB | 6x 3.8TB NVMe | 80 GB | $2.10 / GPU | IB 3200G |
NVIDIA A40 | 128 | 512 GB | 2x 3.8TB NVMe | 48 GB | $0.65 / GPU | - |
NVIDIA A100 SXM | 208 | 1024 GB | 6x 3.8TB NVMe | 80 GB | $1.30 / GPU | IB 1600G |
NVIDIA A100 SXM | 128 | 1024 GB | 4x 3.8TB NVMe | 40 GB | $1.15 / GPU | IB 800G |
NVIDIA A100 PCIe | 64 | 512 GB | 2x 3.8TB NVMe | 40 GB | $1.15 / GPU | - |
NVIDIA A100 MIG | 5 | 55 GB | - | 20 GB | $0.58 / GPU | - |
NVIDIA H200 SXM
Reserved only
208
Optimized For
2048 GB
vCPUs
6x 3.8TB NVMe
vCPUs
141 GB
vCPUs
RoCE 3200G
GPU VRAM
NVIDIA H100 SXM
$2.10 / GPU
208
Optimized For
1024 GB
vCPUs
6x 3.8TB NVMe
vCPUs
80 GB
vCPUs
IB 3200G
GPU VRAM
NVIDIA A40
$0.65 / GPU
128
Optimized For
512 GB
vCPUs
2x 3.8TB NVMe
vCPUs
48 GB
vCPUs
-
GPU VRAM
NVIDIA A100 SXM
$1.30 / GPU
208
Optimized For
1024 GB
vCPUs
6x 3.8TB NVMe
vCPUs
80 GB
vCPUs
IB 1600G
GPU VRAM
NVIDIA A100 SXM
$1.15 / GPU
128
Optimized For
1024 GB
vCPUs
4x 3.8TB NVMe
vCPUs
40 GB
vCPUs
IB 800G
GPU VRAM
NVIDIA A100 PCIe
$1.15 / GPU
64
Optimized For
512 GB
vCPUs
2x 3.8TB NVMe
vCPUs
40 GB
vCPUs
-
GPU VRAM
NVIDIA A100 MIG
$0.58 / GPU
5
Optimized For
55 GB
vCPUs
-
vCPUs
20 GB
vCPUs
-
GPU VRAM
Testimonials
See what customers say about working with Denvr.

Internal Technologies
"We were incredibly impressed by the out-of-the-box ease of use and cross platform support we experienced when running our fine-tuned custom Blockify large language model on Denvr AI Services."
John Hanby IV
Founder and CEO

University of Calgary
"Denvr Dataworks allowed us to proof-run high-level compute with a high-level interconnect, which allowed us to do multi-node training at low cost..."
Yani Ioannou
Assistant Professor, PhD
Inference Comparison
Choose the right GPU for your workload and performance requirements. Use multi-GPU for the largest models available.
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Optimized For
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
VRAM
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Notes
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Optimized For
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
VRAM
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Notes
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Optimized For
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
VRAM
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Notes
Add paragraph text. Click “Edit Text” to update the font, size and more. To change and reuse text themes, go to Site Styles.
Hardware
Optimized For
VRAM
Notes
NVIDIA A100 MIG
Very small models with 0-7B params
20 GB
Smallest available unit based on GPU hardware partitions.
NVIDIA A100
Cost-effective inference, fine-tuned models
40 GB
Best TCO for small batch and private models.
NVIDIA H100
Large model inference, high throughput
80 GB
Best for 70B+ parameter models. Native support for FP8 precision.
NVIDIA H200
Extended context, large batch inference
141 GB
Higher memory bandwidth for context-heavy workloads.
Need help selecting hardware? Our solutions engineers can recommend the optimal configuration for your model and workload profile.
View full pricing →
Inference Comparison
Choose the most cost-effective enterprise GPUs.
Title | Llama 3 8B | Llama 3 70B | Qwen 2 72B | LLama 4 Maverick |
|---|---|---|---|---|
Nvidia A100 SMX 40G | Yes | No | Yes | FP16, 8 GPUs / 4 nodes |
Nvidia A100 SMX 80G | Yes | Yes | Yes | FP16, 8 GPUs / 2 nodes |
Nvidia H100 | Yes | Yes | Yes | FP8, 8 GPUs / 1 node |


Interested in deploying NVIDIA H100 GPUs for training, inference, or large-scale AI workloads?
Contact our team to discuss availability, configurations, pricing, and deployment options. We’ll help you determine the right solution to meet your performance and scalability needs.


Infrastructure you can trust at scale
As an NVIDIA Cloud Partner we build and operate AI clusters following NVIDIA Reference Architectures. Your models and data are supported via strict privacy safeguards and SOC 2 Type 2 security practices.


H100 Use Cases

Multi-GPU Training
H100 scales efficiently and reduces time to train, especially for large distributed runs with heavy communication and big batches.

LLM Inference at Scale
It delivers higher throughput and more stable latency under high concurrency, so you can serve more requests per GPU.

LLM Inference at Scale
It delivers higher throughput and more stable latency under high concurrency, so you can serve more requests per GPU.

RAG Pipelines
It helps keep end to end latency low, so embeddings, reranking, and generation stay responsive as your knowledge base and traffic expand.














