GTC Spring 2022 Keynote

Category: Main Keynote · Year: 2022 · ▶ Watch

Speakers: Jensen Huang - CEO, NVIDIA

Segments (11)

00:00 · Introduction & The I AM AI Video
- Opening video showcasing the impact of AI across various industries, followed by Jensen’s introduction.
05:25 · The Million-X Computing Speedup
- Jensen discusses how accelerated computing and AI have delivered a million-X speedup over the last decade.
09:10 · Climate Science & FourCastNet
- Introduction of FourCastNet, a physics-informed AI model for predicting extreme weather events.
12:18 · NVIDIA AI Software Stack
- Overview of NVIDIA’s AI software, including Triton, Riva, Maxine, Merlin, and NeMo Megatron.
27:08 · Hopper Architecture & H100 GPU
- Unveiling the new Hopper architecture, the H100 GPU, and its groundbreaking features like the Transformer Engine.
32:36 · HGX, DGX, and EOS Supercomputer
- Announcing the systems built around H100, scaling up to the massive EOS AI supercomputer.
40:37 · Grace CPU Superchip & NVLink C2C
- Introduction of the Grace CPU Superchip for AI factories and the NVLink Chip-to-Chip interconnect.
44:22 · Accelerated Computing Libraries
- Updates on SDKs including RAPIDS, cuOPT, Morpheus, cuQuantum, Aerial, Sionna, Modulus, and MONAI.
51:11 · Omniverse & OVX
- Presenting Omniverse for digital twins, the new OVX server, and Omniverse Cloud for collaboration.
01:03:39 · Robotics: Isaac & Metropolis
- Updates on the Isaac robotics platform, Metropolis for tracking, and Clara Holoscan for medical devices.
01:08:14 · Autonomous Vehicles: DRIVE
- Announcing DRIVE Hyperion 9, DRIVE Map, and showcasing DRIVE Sim capabilities.

Product Announcements (19)

[18:35] Riva 2.0
- SDK for Speech AI
- specs: Speech recognition in 7 languages, neural text-to-speech with male/female voices, custom tuning with TAO.
- availability: General Availability
[23:18] Merlin 1.0
- AI Framework for Hyperscale Recommender Systems
- specs: End-to-end pipeline including feature transforms, retrieval, and ranking models.
- availability: General Availability
[23:45] NeMo Megatron
- AI Framework for Training Large Language Models
- specs: Supports models up to trillions of parameters, automated data curation, distributed training.
- availability: Not explicitly stated
[27:08] NVIDIA H100 GPU
- Next-generation data center GPU based on Hopper architecture
- specs: 80 Billion transistors, TSMC 4N process, 4.9 TB/s bandwidth, PCIe Gen5, Transformer Engine, DPX instructions.
- availability: Not explicitly stated
[33:00] HGX H100
- Server board with 8 H100 GPUs
- specs: 32 PFLOPS FP8, 16 PFLOPS FP16, 3.6 TFLOPS in-network compute via SHARP.
- availability: Not explicitly stated
[34:00] DGX H100
- AI computing system
- specs: 8 H100 GPUs, 32 PFLOPS AI performance, 640 GB HBM3 memory, 24 TB/s memory bandwidth.
- availability: Not explicitly stated
[34:46] NVLink Switch System
- External switch to connect multiple DGX nodes
- specs: Connects up to 32 DGX nodes, enabling 256 GPUs to act as one.
- availability: Not explicitly stated
[36:58] EOS Supercomputer
- NVIDIA’s internal AI supercomputer
- specs: 18 DGX Pods, 4608 H100 GPUs, 18.4 EFLOPS AI performance, 275 PFLOPS FP64.
- availability: Online in a few months
[39:00] H100 CNX
- Converged H100 GPU and ConnectX-7 SmartNIC
- specs: Direct DMA from network to H100 at 50 GB/s, bypassing CPU bottlenecks.
- availability: Not explicitly stated
[40:37] Grace CPU Superchip
- Data center CPU designed for AI and HPC
- specs: 144 CPU cores, 1 TB/s memory bandwidth, SPECrate2017_int_base over 740, 500W power.
- availability: On track to ship next year (2023)
[42:40] NVLink Chip-to-Chip (C2C)
- High-speed interconnect for custom silicon integration
- specs: 900 GB/s bandwidth, ultra-energy efficient, low latency.
- availability: Available to customers and partners
[45:56] cuOPT
- AI-accelerated solver for route optimization
- specs: Multi-agent, multi-constraint route planning optimization.
- availability: Not explicitly stated
[52:59] NVIDIA OVX Server
- Computing system designed for Omniverse digital twins
- specs: 8 A40 GPUs, 3 ConnectX-6 NICs, 2 Intel Ice Lake CPUs, 1 TB system memory.
- availability: Available now from top computer makers
[56:58] Spectrum-4
- 400G Ethernet Switch
- specs: 100 Billion transistors, 51.2 Tbps bandwidth, 128 ports of 400GbE.
- availability: Will sample in Q3
[58:43] Omniverse Cloud
- Cloud-based suite for Omniverse collaboration
- specs: One-click design collaboration from anywhere without needing local RTX hardware.
- availability: Not explicitly stated
[01:04:00] Isaac Nova Orin
- Reference AMR (Autonomous Mobile Robot) architecture
- specs: Powered by Jetson AGX Orin, includes 2 cameras, 2 lidars, 8 ultrasonics, 4 fisheye cameras.
- availability: Available in Q2
[01:07:49] Clara Holoscan MGX
- Medical-grade platform for robotic medical devices
- specs: Designed to IEC 62304 standards, powered by Orin and CX7.
- availability: Early access today, GA in May, medical-grade readiness in Q1 2023
[01:08:55] DRIVE Hyperion 9
- Open reference AV platform
- specs: Powered by dual Atlan SoCs, 14 cameras, 9 radars, 3 lidars, 20 ultrasonics.
- availability: For cars shipping starting in 2026
[01:09:22] DRIVE Map
- Multi-modal map engine for autonomous vehicles
- specs: Includes camera, radar, and lidar layers; auto-generated from ground truth and crowd-sourced data.
- availability: Expect to map 500,000 km by end of 2024

Specific Numbers (12)

Timestamp	Metric	Value	Context
06:00	Speedup	1,000,000x	Increase in computing performance over the past decade due to accelerated computing and ML.
07:28	Developers	3,000,000	Number of developers in the NVIDIA ecosystem.
10:20	Speedup	4 to 5 orders of magnitude	How much faster FourCastNet predicts weather compared to classical numerical models.
27:30	Transistors	80 Billion	Number of transistors in the H100 GPU.
28:20	Performance	4,000 TFLOPS	FP8 performance of the H100 GPU.
31:42	Speedup	40x	Speedup for dynamic programming algorithms using new DPX instructions on Hopper.
34:00	Performance	32 PFLOPS	AI performance of a single DGX H100 system.
36:58	Performance	18.4 EFLOPS	AI performance of the EOS supercomputer.
38:08	Speedup	30x	Inference throughput increase of H100 over A100 for large language models.
41:26	Cores	144	Number of CPU cores in the Grace CPU Superchip.
41:26	Performance	740	Estimated SPECrate2017_int_base score for the Grace CPU Superchip.
56:58	Bandwidth	51.2 Tbps	Total bandwidth of the Spectrum-4 switch.

Benchmark Claims (4)

[29:12] AI Processing (FP8 vs FP16): 6x
- vs: Ampere A100
- gain: Hopper H100 delivers 6x the performance of Ampere A100 for AI processing.
[38:08] Large Language Model Inference (Megatron 530B): 30x
- vs: Ampere A100
- gain: H100 provides 30x higher throughput at 1-second response latency compared to A100.
[41:26] CPU Memory Bandwidth: 1 TB/s
- vs: Top Gen5 CPUs
- gain: Grace CPU Superchip provides 2 to 3 times the memory bandwidth of top Gen5 CPUs.
[41:26] CPU Energy Efficiency: 2x
- vs: Best CPUs at the time
- gain: Grace CPU Superchip is twice as energy efficient as the best CPUs.

Customer Stories (4)

[22:47] Snap
- Used NVIDIA Merlin for ad and content recommendations.
- outcome: Reduced costs by 50% and decreased serving latency by 2x.
[22:57] Tencent WeChat
- Used NVIDIA Merlin for short video recommendations.
- outcome: Achieved 4x lower latency and 10x throughput, halving costs by moving from CPU to GPU.
[01:01:21] Amazon Robotics
- Used Omniverse to build digital twins of fulfillment centers to train and optimize autonomous robots.
- outcome: Enabled safer, more efficient inventory movement and optimized warehouse design before physical deployment.
[01:04:18] PepsiCo
- Used Omniverse and Metropolis to create digital twins of distribution centers.
- outcome: Optimized conveyor belt speeds in real-time, preventing congestion and reducing energy usage.

Key Technologies (4)

Transformer Engine: Dynamically processes layers of a transformer network using FP8 and FP16 formats to drastically speed up training without losing accuracy.
Hopper Confidential Computing: Protects data and AI models while in use on the GPU, isolating them from the host OS and hypervisor.
DPX Instructions: Accelerates dynamic programming algorithms (like Smith-Waterman for genomics) by up to 40x.
NVLink Chip-to-Chip (C2C): An ultra-fast, energy-efficient interconnect allowing custom silicon to connect directly to NVIDIA GPUs, CPUs, and DPUs.

Demos Shown (7)

[11:03] FourCastNet predicting an atmospheric river.
- True
[14:59] A physically simulated character learning to walk and fight using reinforcement learning.
- True
[19:09] Riva FastPitch generating expressive text-to-speech.
- True
[20:30] Maxine maintaining eye contact and translating speech in real-time video conferencing.
- True
[59:05] Multiple designers collaborating in real-time using Omniverse Cloud.
- True
[01:00:02] Tokkio (Omniverse Avatar) interacting conversationally and answering questions.
- True
[01:10:00] DRIVE Sim reconstructing a real-world driving scenario into a modifiable 3D simulation.
- True

Predictions / Commitments (5)

[36:58, In a few months] EOS supercomputer will be online.
[40:37, Next year (2023)] Grace CPU Superchip will ship.
[56:58, Q3 2022] Spectrum-4 switch will begin sampling.
[01:08:55, Starting in 2026] DRIVE Hyperion 9 will ship in vehicles.
[01:09:22, By the end of 2024] DRIVE Map will map 500,000 kilometers of roadways.

Companies Mentioned (4)

TSMC · Intel · BYD · Lucid

Notable Quotes (3)

AI is racing in every direction. New architectures, new learning strategies, larger and more robust models, new science, new applications, new industries. — Jensen Huang @ 06:43

Companies are manufacturing intelligence and operating giant AI factories. — Jensen Huang @ 26:00

A digital twin is a virtual world that’s connected to the physical world. And in the context of the internet, it is the next evolution. — Jensen Huang @ 51:11

Key Topics

Artificial Intelligence · Accelerated Computing · Hopper Architecture · H100 GPU · Grace CPU · Data Center Infrastructure · Omniverse · Digital Twins · Robotics · Autonomous Vehicles · Large Language Models · Transformers · Networking (Infiniband/Ethernet)

Takeaways

NVIDIA is transitioning data centers into ‘AI Factories’ designed to manufacture intelligence.
The new Hopper architecture and H100 GPU provide massive leaps in performance, specifically tailored for Transformer models.
NVIDIA is expanding its silicon footprint beyond GPUs with the Grace CPU Superchip and advanced networking (Spectrum-4, ConnectX-7).
Omniverse is positioned as the foundational platform for the next era of the internet, focusing on physically accurate digital twins.
Software and SDKs (like Riva, Merlin, and Isaac) are critical to NVIDIA’s strategy, making complex AI accessible across industries.
Robotics and autonomous systems are moving from perception to action, heavily relying on simulation (DRIVE Sim, Isaac Sim) before real-world deployment.