Posted on 30 October, 2025

A new class of AI superchip combining massive compute density, a second-generation Transformer Engine and scale-first interconnects to power trillion-parameter models and real-time inference.
Blackwell is NVIDIA’s data centre GPU architecture designed for the scale and performance demands of frontier AI. Key architectural characteristics include very large multi-die GPUs, a transformer-centric Tensor Core design and fifth-generation NVLink for exceptionally high GPU-to-GPU bandwidth.
Core Principles
- Scale first: Designed to be used in single-rack and multi-rack NVLink domains to support trillion-parameter training and real-time inference.
- Transformer focus: A second-generation Transformer Engine and updated Tensor Core formats optimise throughput for LLM training and inference.
- Coherent memory and chip-to-chip links: Reticle-sized dies connected with high-bandwidth chip-to-chip interfaces to present a single unified GPU.
Technical Highlights
- Transistor count and process: Blackwell GPUs pack around 208 billion transistors and are manufactured on an advanced TSMC 4-class process. The largest Blackwell data centre GPUs are built as two reticle-limited dies connected by a high-bandwidth chip-to-chip interconnect.
- Chip-to-chip interconnect: The two dies are joined by a chip-to-chip interconnect capable of approximately 10 terabytes per second of throughput, enabling a single-GPU programming model despite multi-die construction.
- NVLink fifth generation: Per-GPU NVLink bandwidth reaches approximately 1.8 terabytes per second (18 links × 100 GB/s), enabling rack-scale NVLink domains such as the GB200 NVL72.
- NVLink-C2C: NVLink chip-to-chip connectivity between Grace CPUs and Blackwell GPUs provides very high-bandwidth CPU↔GPU coherency (900 GB/s per connection in current GB200 designs).
- Transformer Engine v2: New Tensor Core microarchitectures and precision formats (including expanded FP8/FP6/FP4 support and community microscaling formats) target both inference throughput and training efficiency.

Products and Systems
GB200 Grace Blackwell Superchip
A packaged solution that couples Blackwell GPUs with the Grace CPU and NVLink-C2C coherency. It is a building block for rack-scale systems such as GB200 NVL72.
GB200 NVL72
A rack-scale, liquid-cooled system that connects 36 Grace CPUs and 72 Blackwell GPUs in a single NVLink domain. It is purpose-built for trillion-parameter LLM training and real-time inference at hyperscale.
DGX GB200 and HGX derivatives
Vendor systems using Blackwell to deliver dense GPU compute for enterprises and cloud providers.
Why Blackwell for Large AI Models
Blackwell shifts the bottleneck away from interconnect and memory by offering very high intra-GPU and inter-GPU bandwidth, cohesive memory models and transformer-first cores. The net result is substantially faster iteration times for large models, lower latency for inference and improved energy efficiency per token compared with prior generations on many LLM configurations.
Typical Blackwell use cases
- Trillion-parameter and multi-trillion-parameter model training.
- Real-time inference for agentic or multi-turn LLM use cases.
- Large-scale multimodal models and foundation model fine-tuning.
- HPC workloads that benefit from very high GPU ↔ GPU bandwidth.
Deployment and Integration Patterns
Blackwell systems are commonly deployed in the following ways:
- Rack-scale NVLink domains (NVL72): For the largest models, where GPUs must operate in a tightly coupled manner.
- DGX and HGX offerings: Vendor-integrated systems optimised for data centre form factors, power and cooling.
- Hybrid fabrics: NVLink within racks for GPU interconnect, Spectrum or Quantum fabrics for inter-rack storage and multi-tenant traffic, and InfiniBand or TCP for west-east communications as required.
Why choose Boston Limited for Blackwell Projects
Boston Limited offers practical expertise across the entire Blackwell deployment lifecycle:
- Architecture advice: Right-sized NVLink domains, memory budgeting and topology guidance for your models.
- Proof of concept and benchmarking: Run model scale-out tests in our labs with your training scripts and datasets to validate throughput and memory behaviour.
- Integration and deployment services: Rack design, liquid cooling integration, firmware and OS stack bring-up and long-term support plans.
- Model performance tuning: Partner engineering sessions to align infrastructure with model parallelism strategies and NVLink topology.
Example Technical Checklist for GB200 NVL72 Projects
- Rack cooling design and CDU capacity planning
- NVLink switch fabric design and cabling
- Power distribution and PDU planning for sustained 24/7 operation
- Storage fabric design with Spectrum or Quantum interconnects
- Firmware, driver and OS validation pass
- Operator training and runbook handover
FAQs
Do I need liquid cooling for Blackwell?
Many GB200 NVL72 deployments use liquid cooling to reach the highest sustained performance, but air-cooled DGX/HGX options are available for smaller clusters or different power envelopes.
How does Blackwell interact with Ethernet fabrics like Spectrum?
Blackwell uses NVLink for tightly coupled GPU communication inside racks and relies on Ethernet (Spectrum) or InfiniBand (Quantum) for larger multi-rack fabrics, storage connectivity and multi-tenant traffic.
Specifications and product availability change over time; contact Boston Limited for the most recent datasheets and configuration guides.
Speak with our sales team to learn how NVIDIA Blackwell Architecture can propel your business to new heights.
Tags: nvidia, blackwell, blackwell architecture, gpu, nvlink, ai