NVIDIA Blackwell Architecture

Posted on 30 October, 2025

A new class of AI superchip combining massive compute density, a second-generation Transformer Engine and scale-first interconnects to power trillion-parameter models and real-time inference.

Blackwell is NVIDIA’s data centre GPU architecture designed for the scale and performance demands of frontier AI. Key architectural characteristics include very large multi-die GPUs, a transformer-centric Tensor Core design and fifth-generation NVLink for exceptionally high GPU-to-GPU bandwidth.

Core Principles

Scale first: Designed to be used in single-rack and multi-rack NVLink domains to support trillion-parameter training and real-time inference.
Transformer focus: A second-generation Transformer Engine and updated Tensor Core formats optimise throughput for LLM training and inference.
Coherent memory and chip-to-chip links: Reticle-sized dies connected with high-bandwidth chip-to-chip interfaces to present a single unified GPU.

Technical Highlights

Transistor count and process: Blackwell GPUs pack around 208 billion transistors and are manufactured on an advanced TSMC 4-class process. The largest Blackwell data centre GPUs are built as two reticle-limited dies connected by a high-bandwidth chip-to-chip interconnect.
Chip-to-chip interconnect: The two dies are joined by a chip-to-chip interconnect capable of approximately 10 terabytes per second of throughput, enabling a single-GPU programming model despite multi-die construction.
NVLink fifth generation: Per-GPU NVLink bandwidth reaches approximately 1.8 terabytes per second (18 links × 100 GB/s), enabling rack-scale NVLink domains such as the GB200 NVL72.
NVLink-C2C: NVLink chip-to-chip connectivity between Grace CPUs and Blackwell GPUs provides very high-bandwidth CPU↔GPU coherency (900 GB/s per connection in current GB200 designs).
Transformer Engine v2: New Tensor Core microarchitectures and precision formats (including expanded FP8/FP6/FP4 support and community microscaling formats) target both inference throughput and training efficiency.

Products and Systems

GB200 Grace Blackwell Superchip
A packaged solution that couples Blackwell GPUs with the Grace CPU and NVLink-C2C coherency. It is a building block for rack-scale systems such as GB200 NVL72.

GB200 NVL72
A rack-scale, liquid-cooled system that connects 36 Grace CPUs and 72 Blackwell GPUs in a single NVLink domain. It is purpose-built for trillion-parameter LLM training and real-time inference at hyperscale.

DGX GB200 and HGX derivatives
Vendor systems using Blackwell to deliver dense GPU compute for enterprises and cloud providers.

Why Blackwell for Large AI Models

Blackwell shifts the bottleneck away from interconnect and memory by offering very high intra-GPU and inter-GPU bandwidth, cohesive memory models and transformer-first cores. The net result is substantially faster iteration times for large models, lower latency for inference and improved energy efficiency per token compared with prior generations on many LLM configurations.

Typical Blackwell use cases

Trillion-parameter and multi-trillion-parameter model training.
Real-time inference for agentic or multi-turn LLM use cases.
Large-scale multimodal models and foundation model fine-tuning.
HPC workloads that benefit from very high GPU ↔ GPU bandwidth.

Deployment and Integration Patterns

Blackwell systems are commonly deployed in the following ways:

Rack-scale NVLink domains (NVL72): For the largest models, where GPUs must operate in a tightly coupled manner.
DGX and HGX offerings: Vendor-integrated systems optimised for data centre form factors, power and cooling.
Hybrid fabrics: NVLink within racks for GPU interconnect, Spectrum or Quantum fabrics for inter-rack storage and multi-tenant traffic, and InfiniBand or TCP for west-east communications as required.

Why choose Boston Limited for Blackwell Projects

Boston Limited offers practical expertise across the entire Blackwell deployment lifecycle:

Architecture advice: Right-sized NVLink domains, memory budgeting and topology guidance for your models.
Proof of concept and benchmarking: Run model scale-out tests in our labs with your training scripts and datasets to validate throughput and memory behaviour.
Integration and deployment services: Rack design, liquid cooling integration, firmware and OS stack bring-up and long-term support plans.
Model performance tuning: Partner engineering sessions to align infrastructure with model parallelism strategies and NVLink topology.

Example Technical Checklist for GB200 NVL72 Projects

Rack cooling design and CDU capacity planning
NVLink switch fabric design and cabling
Power distribution and PDU planning for sustained 24/7 operation
Storage fabric design with Spectrum or Quantum interconnects
Firmware, driver and OS validation pass
Operator training and runbook handover

FAQs

Do I need liquid cooling for Blackwell?
Many GB200 NVL72 deployments use liquid cooling to reach the highest sustained performance, but air-cooled DGX/HGX options are available for smaller clusters or different power envelopes.

How does Blackwell interact with Ethernet fabrics like Spectrum?
Blackwell uses NVLink for tightly coupled GPU communication inside racks and relies on Ethernet (Spectrum) or InfiniBand (Quantum) for larger multi-rack fabrics, storage connectivity and multi-tenant traffic.

Specifications and product availability change over time; contact Boston Limited for the most recent datasheets and configuration guides.

Speak with our sales team to learn how NVIDIA Blackwell Architecture can propel your business to new heights.

Tags: nvidia, blackwell, blackwell architecture, gpu, nvlink, ai

NVIDIA Blackwell Architecture

Core Principles

Technical Highlights

Products and Systems

Why Blackwell for Large AI Models

Deployment and Integration Patterns

Why choose Boston Limited for Blackwell Projects

Example Technical Checklist for GB200 NVL72 Projects

FAQs

Archives

Recent Blogs

Test out any of our solutions at Boston Labs

NVIDIA Blackwell Architecture

Core Principles

Technical Highlights

Products and Systems

Why Blackwell for Large AI Models

Deployment and Integration Patterns

Why choose Boston Limited for Blackwell Projects

Example Technical Checklist for GB200 NVL72 Projects

FAQs

Archives

Recent Blogs

Test out any of our solutions at Boston Labs

Latest Event

SUPERCOMPUTING | 17th - 21st November 2025, America's Center Convention Center