GTC Spring 2022 Keynote
Category: Main Keynote · Year: 2022 · ▶ Watch
Speakers: Jensen Huang - CEO, NVIDIA
Segments (11)
- 00:00 · Introduction & The I AM AI Video
- Opening video showcasing the impact of AI across various industries, followed by Jensen’s introduction.
- 05:25 · The Million-X Computing Speedup
- Jensen discusses how accelerated computing and AI have delivered a million-X speedup over the last decade.
- 09:10 · Climate Science & FourCastNet
- Introduction of FourCastNet, a physics-informed AI model for predicting extreme weather events.
- 12:18 · NVIDIA AI Software Stack
- Overview of NVIDIA’s AI software, including Triton, Riva, Maxine, Merlin, and NeMo Megatron.
- 27:08 · Hopper Architecture & H100 GPU
- Unveiling the new Hopper architecture, the H100 GPU, and its groundbreaking features like the Transformer Engine.
- 32:36 · HGX, DGX, and EOS Supercomputer
- Announcing the systems built around H100, scaling up to the massive EOS AI supercomputer.
- 40:37 · Grace CPU Superchip & NVLink C2C
- Introduction of the Grace CPU Superchip for AI factories and the NVLink Chip-to-Chip interconnect.
- 44:22 · Accelerated Computing Libraries
- Updates on SDKs including RAPIDS, cuOPT, Morpheus, cuQuantum, Aerial, Sionna, Modulus, and MONAI.
- 51:11 · Omniverse & OVX
- Presenting Omniverse for digital twins, the new OVX server, and Omniverse Cloud for collaboration.
- 01:03:39 · Robotics: Isaac & Metropolis
- Updates on the Isaac robotics platform, Metropolis for tracking, and Clara Holoscan for medical devices.
- 01:08:14 · Autonomous Vehicles: DRIVE
- Announcing DRIVE Hyperion 9, DRIVE Map, and showcasing DRIVE Sim capabilities.
Product Announcements (19)
- [18:35] Riva 2.0
- SDK for Speech AI
- specs: Speech recognition in 7 languages, neural text-to-speech with male/female voices, custom tuning with TAO.
- availability: General Availability
- [23:18] Merlin 1.0
- AI Framework for Hyperscale Recommender Systems
- specs: End-to-end pipeline including feature transforms, retrieval, and ranking models.
- availability: General Availability
- [23:45] NeMo Megatron
- AI Framework for Training Large Language Models
- specs: Supports models up to trillions of parameters, automated data curation, distributed training.
- availability: Not explicitly stated
- [27:08] NVIDIA H100 GPU
- Next-generation data center GPU based on Hopper architecture
- specs: 80 Billion transistors, TSMC 4N process, 4.9 TB/s bandwidth, PCIe Gen5, Transformer Engine, DPX instructions.
- availability: Not explicitly stated
- [33:00] HGX H100
- Server board with 8 H100 GPUs
- specs: 32 PFLOPS FP8, 16 PFLOPS FP16, 3.6 TFLOPS in-network compute via SHARP.
- availability: Not explicitly stated
- [34:00] DGX H100
- AI computing system
- specs: 8 H100 GPUs, 32 PFLOPS AI performance, 640 GB HBM3 memory, 24 TB/s memory bandwidth.
- availability: Not explicitly stated
- [34:46] NVLink Switch System
- External switch to connect multiple DGX nodes
- specs: Connects up to 32 DGX nodes, enabling 256 GPUs to act as one.
- availability: Not explicitly stated
- [36:58] EOS Supercomputer
- NVIDIA’s internal AI supercomputer
- specs: 18 DGX Pods, 4608 H100 GPUs, 18.4 EFLOPS AI performance, 275 PFLOPS FP64.
- availability: Online in a few months
- [39:00] H100 CNX
- Converged H100 GPU and ConnectX-7 SmartNIC
- specs: Direct DMA from network to H100 at 50 GB/s, bypassing CPU bottlenecks.
- availability: Not explicitly stated
- [40:37] Grace CPU Superchip
- Data center CPU designed for AI and HPC
- specs: 144 CPU cores, 1 TB/s memory bandwidth, SPECrate2017_int_base over 740, 500W power.
- availability: On track to ship next year (2023)
- [42:40] NVLink Chip-to-Chip (C2C)
- High-speed interconnect for custom silicon integration
- specs: 900 GB/s bandwidth, ultra-energy efficient, low latency.
- availability: Available to customers and partners
- [45:56] cuOPT
- AI-accelerated solver for route optimization
- specs: Multi-agent, multi-constraint route planning optimization.
- availability: Not explicitly stated
- [52:59] NVIDIA OVX Server
- Computing system designed for Omniverse digital twins
- specs: 8 A40 GPUs, 3 ConnectX-6 NICs, 2 Intel Ice Lake CPUs, 1 TB system memory.
- availability: Available now from top computer makers
- [56:58] Spectrum-4
- 400G Ethernet Switch
- specs: 100 Billion transistors, 51.2 Tbps bandwidth, 128 ports of 400GbE.
- availability: Will sample in Q3
- [58:43] Omniverse Cloud
- Cloud-based suite for Omniverse collaboration
- specs: One-click design collaboration from anywhere without needing local RTX hardware.
- availability: Not explicitly stated
- [01:04:00] Isaac Nova Orin
- Reference AMR (Autonomous Mobile Robot) architecture
- specs: Powered by Jetson AGX Orin, includes 2 cameras, 2 lidars, 8 ultrasonics, 4 fisheye cameras.
- availability: Available in Q2
- [01:07:49] Clara Holoscan MGX
- Medical-grade platform for robotic medical devices
- specs: Designed to IEC 62304 standards, powered by Orin and CX7.
- availability: Early access today, GA in May, medical-grade readiness in Q1 2023
- [01:08:55] DRIVE Hyperion 9
- Open reference AV platform
- specs: Powered by dual Atlan SoCs, 14 cameras, 9 radars, 3 lidars, 20 ultrasonics.
- availability: For cars shipping starting in 2026
- [01:09:22] DRIVE Map
- Multi-modal map engine for autonomous vehicles
- specs: Includes camera, radar, and lidar layers; auto-generated from ground truth and crowd-sourced data.
- availability: Expect to map 500,000 km by end of 2024
Specific Numbers (12)
| Timestamp | Metric | Value | Context |
|---|---|---|---|
| 06:00 | Speedup | 1,000,000x | Increase in computing performance over the past decade due to accelerated computing and ML. |
| 07:28 | Developers | 3,000,000 | Number of developers in the NVIDIA ecosystem. |
| 10:20 | Speedup | 4 to 5 orders of magnitude | How much faster FourCastNet predicts weather compared to classical numerical models. |
| 27:30 | Transistors | 80 Billion | Number of transistors in the H100 GPU. |
| 28:20 | Performance | 4,000 TFLOPS | FP8 performance of the H100 GPU. |
| 31:42 | Speedup | 40x | Speedup for dynamic programming algorithms using new DPX instructions on Hopper. |
| 34:00 | Performance | 32 PFLOPS | AI performance of a single DGX H100 system. |
| 36:58 | Performance | 18.4 EFLOPS | AI performance of the EOS supercomputer. |
| 38:08 | Speedup | 30x | Inference throughput increase of H100 over A100 for large language models. |
| 41:26 | Cores | 144 | Number of CPU cores in the Grace CPU Superchip. |
| 41:26 | Performance | 740 | Estimated SPECrate2017_int_base score for the Grace CPU Superchip. |
| 56:58 | Bandwidth | 51.2 Tbps | Total bandwidth of the Spectrum-4 switch. |
Benchmark Claims (4)
- [29:12] AI Processing (FP8 vs FP16): 6x
- vs: Ampere A100
- gain: Hopper H100 delivers 6x the performance of Ampere A100 for AI processing.
- [38:08] Large Language Model Inference (Megatron 530B): 30x
- vs: Ampere A100
- gain: H100 provides 30x higher throughput at 1-second response latency compared to A100.
- [41:26] CPU Memory Bandwidth: 1 TB/s
- vs: Top Gen5 CPUs
- gain: Grace CPU Superchip provides 2 to 3 times the memory bandwidth of top Gen5 CPUs.
- [41:26] CPU Energy Efficiency: 2x
- vs: Best CPUs at the time
- gain: Grace CPU Superchip is twice as energy efficient as the best CPUs.
Customer Stories (4)
- [22:47] Snap
- Used NVIDIA Merlin for ad and content recommendations.
- outcome: Reduced costs by 50% and decreased serving latency by 2x.
- [22:57] Tencent WeChat
- Used NVIDIA Merlin for short video recommendations.
- outcome: Achieved 4x lower latency and 10x throughput, halving costs by moving from CPU to GPU.
- [01:01:21] Amazon Robotics
- Used Omniverse to build digital twins of fulfillment centers to train and optimize autonomous robots.
- outcome: Enabled safer, more efficient inventory movement and optimized warehouse design before physical deployment.
- [01:04:18] PepsiCo
- Used Omniverse and Metropolis to create digital twins of distribution centers.
- outcome: Optimized conveyor belt speeds in real-time, preventing congestion and reducing energy usage.
Key Technologies (4)
- Transformer Engine: Dynamically processes layers of a transformer network using FP8 and FP16 formats to drastically speed up training without losing accuracy.
- Hopper Confidential Computing: Protects data and AI models while in use on the GPU, isolating them from the host OS and hypervisor.
- DPX Instructions: Accelerates dynamic programming algorithms (like Smith-Waterman for genomics) by up to 40x.
- NVLink Chip-to-Chip (C2C): An ultra-fast, energy-efficient interconnect allowing custom silicon to connect directly to NVIDIA GPUs, CPUs, and DPUs.
Demos Shown (7)
- [11:03] FourCastNet predicting an atmospheric river.
- True
- [14:59] A physically simulated character learning to walk and fight using reinforcement learning.
- True
- [19:09] Riva FastPitch generating expressive text-to-speech.
- True
- [20:30] Maxine maintaining eye contact and translating speech in real-time video conferencing.
- True
- [59:05] Multiple designers collaborating in real-time using Omniverse Cloud.
- True
- [01:00:02] Tokkio (Omniverse Avatar) interacting conversationally and answering questions.
- True
- [01:10:00] DRIVE Sim reconstructing a real-world driving scenario into a modifiable 3D simulation.
- True
Predictions / Commitments (5)
- [36:58, In a few months] EOS supercomputer will be online.
- [40:37, Next year (2023)] Grace CPU Superchip will ship.
- [56:58, Q3 2022] Spectrum-4 switch will begin sampling.
- [01:08:55, Starting in 2026] DRIVE Hyperion 9 will ship in vehicles.
- [01:09:22, By the end of 2024] DRIVE Map will map 500,000 kilometers of roadways.
Companies Mentioned (4)
TSMC · Intel · BYD · Lucid
Notable Quotes (3)
AI is racing in every direction. New architectures, new learning strategies, larger and more robust models, new science, new applications, new industries. — Jensen Huang @ 06:43
Companies are manufacturing intelligence and operating giant AI factories. — Jensen Huang @ 26:00
A digital twin is a virtual world that’s connected to the physical world. And in the context of the internet, it is the next evolution. — Jensen Huang @ 51:11
Key Topics
Artificial Intelligence · Accelerated Computing · Hopper Architecture · H100 GPU · Grace CPU · Data Center Infrastructure · Omniverse · Digital Twins · Robotics · Autonomous Vehicles · Large Language Models · Transformers · Networking (Infiniband/Ethernet)
Takeaways
- NVIDIA is transitioning data centers into ‘AI Factories’ designed to manufacture intelligence.
- The new Hopper architecture and H100 GPU provide massive leaps in performance, specifically tailored for Transformer models.
- NVIDIA is expanding its silicon footprint beyond GPUs with the Grace CPU Superchip and advanced networking (Spectrum-4, ConnectX-7).
- Omniverse is positioned as the foundational platform for the next era of the internet, focusing on physically accurate digital twins.
- Software and SDKs (like Riva, Merlin, and Isaac) are critical to NVIDIA’s strategy, making complex AI accessible across industries.
- Robotics and autonomous systems are moving from perception to action, heavily relying on simulation (DRIVE Sim, Isaac Sim) before real-world deployment.