
Model Catalog
Deploy from a catalog of leading open-weight models, pre-configured and sized for optimal GPU configurations, context length, and batch performance. Or bring your own custom model.
Custom Models
Deploy any open-source or fine-tuned model using vLLM Server or Ollama Server applications. Full control over serving parameters including quantization, context length, and batch size.
Secure by Default
Every endpoint runs on dedicated hardware with end-to-end encryption and zero-trust authentication. Deploy to public or private IPs based on your security requirements.
No Token Limits
Private endpoints are charged by GPU-hour, not by token. No metering on usage, no surprise bills. Run as many tokens as your hardware can serve.
Sovereign Infrastructure
Built on infrastructure owned and operated by Denvr in Canadian and US data centers. No foreign jurisdiction exposure. No third-party dependencies.
OpenAI Compatible API
Drop-in compatibility with the OpenAI API specification. Swap your base URL and start inferencing with zero code changes. No vendor lock-in, no proprietary SDKs.
Secure Model Endpoints
Select a model, choose your hardware, and deploy a private endpoint in minutes. Every endpoint runs on single-tenant infrastructure with dedicated GPUs, encrypted connections, and no shared resources.
From Setup To Live In Minutes
01
Select a Model
Choose from our catalog of leading open-weight foundation models, or bring your own custom model.
02
Choose Your Hardware
Select the GPU that fits your workload. NVIDIA H200/H100/A100, Intel Gaudi 2, and more. Scale from a single GPU to multi-GPU configurations.
03
Deploy
Launch your private endpoint in minutes. Your model, your hardware, your API endpoint. Ready for production.

Model Catalog
Launch production endpoints with the most capable open-weight models available. New models are added regularly.

Meta Llama 3.3
Optimized for multi-lingual dialogue use cases and outperforms many open source and closed chat models on industry benchmarks.

DeepSeek R1
Reasoning model that uses reinforcement learning to improve problem-solving capabilities across mathematics, coding, and complex reasoning tasks.

OpenAI GPT-OSS
General-purpose natural language understanding and generation tasks.

Qwen3-Coder-Next
State-of-the-art coding agent with ultra-efficient 3B active inference.

Gemma 3
State-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Hardware Options
Choose the right GPU for your workload and performance requirements. Use multi-GPU for the largest models available.
NVIDIA H200
Optimized For
Extended context, large batch inference
VRAM
141 GB
Notes
Higher memory bandwidth for context-heavy workloads.
NVIDIA H100
Optimized For
Large model inference, high throughput
VRAM
80 GB
Notes
Best for 70B+ parameter models. Native support for FP8 precision.
Intel Gaudi 2
Optimized For
Cost-effective inference for open source models
VRAM
96 GB
Notes
Near H100 performance with FP8 inference.
NVIDIA A100
Optimized For
Cost-effective inference, fine-tuned models
VRAM
40 GB
Notes
Best TCO for small batch and private models.
NVIDIA A100 MIG
Optimized For
Very small models with 0-7B params
VRAM
20 GB
Notes
Smallest available unit based on GPU hardware partitions.
Hardware
Optimized For
VRAM
Notes
NVIDIA H200
Extended context, large batch inference
141 GB
Higher memory bandwidth for context-heavy workloads.
NVIDIA H100
Large model inference, high throughput
80 GB
Best for 70B+ parameter models. Native support for FP8 precision.
Intel Gaudi 2
Cost-effective inference for open source models
96 GB
Near H100 performance with FP8 inference.
NVIDIA A100
Cost-effective inference, fine-tuned models
40 GB
Best TCO for small batch and private models.
NVIDIA A100 MIG
Very small models with 0-7B params
20 GB
Smallest available unit based on GPU hardware partitions.
Need help selecting hardware? Our solutions engineers can recommend the optimal configuration for your model and workload profile.
View full pricing →








