GTC Spring 2022 Keynote

Category: Main Keynote · Year: 2022 · ▶ Watch

Speakers: Jensen Huang - CEO, NVIDIA

Switch language → 中文

Segments (11)

  • 00:00 · Introduction & The I AM AI Video
    • Opening video showcasing the impact of AI across various industries, followed by Jensen’s introduction.
  • 05:25 · The Million-X Computing Speedup
    • Jensen discusses how accelerated computing and AI have delivered a million-X speedup over the last decade.
  • 09:10 · Climate Science & FourCastNet
    • Introduction of FourCastNet, a physics-informed AI model for predicting extreme weather events.
  • 12:18 · NVIDIA AI Software Stack
    • Overview of NVIDIA’s AI software, including Triton, Riva, Maxine, Merlin, and NeMo Megatron.
  • 27:08 · Hopper Architecture & H100 GPU
    • Unveiling the new Hopper architecture, the H100 GPU, and its groundbreaking features like the Transformer Engine.
  • 32:36 · HGX, DGX, and EOS Supercomputer
    • Announcing the systems built around H100, scaling up to the massive EOS AI supercomputer.
  • 40:37 · Grace CPU Superchip & NVLink C2C
    • Introduction of the Grace CPU Superchip for AI factories and the NVLink Chip-to-Chip interconnect.
  • 44:22 · Accelerated Computing Libraries
    • Updates on SDKs including RAPIDS, cuOPT, Morpheus, cuQuantum, Aerial, Sionna, Modulus, and MONAI.
  • 51:11 · Omniverse & OVX
    • Presenting Omniverse for digital twins, the new OVX server, and Omniverse Cloud for collaboration.
  • 01:03:39 · Robotics: Isaac & Metropolis
    • Updates on the Isaac robotics platform, Metropolis for tracking, and Clara Holoscan for medical devices.
  • 01:08:14 · Autonomous Vehicles: DRIVE
    • Announcing DRIVE Hyperion 9, DRIVE Map, and showcasing DRIVE Sim capabilities.

Product Announcements (19)

  • [18:35] Riva 2.0
    • SDK for Speech AI
    • specs: Speech recognition in 7 languages, neural text-to-speech with male/female voices, custom tuning with TAO.
    • availability: General Availability
  • [23:18] Merlin 1.0
    • AI Framework for Hyperscale Recommender Systems
    • specs: End-to-end pipeline including feature transforms, retrieval, and ranking models.
    • availability: General Availability
  • [23:45] NeMo Megatron
    • AI Framework for Training Large Language Models
    • specs: Supports models up to trillions of parameters, automated data curation, distributed training.
    • availability: Not explicitly stated
  • [27:08] NVIDIA H100 GPU
    • Next-generation data center GPU based on Hopper architecture
    • specs: 80 Billion transistors, TSMC 4N process, 4.9 TB/s bandwidth, PCIe Gen5, Transformer Engine, DPX instructions.
    • availability: Not explicitly stated
  • [33:00] HGX H100
    • Server board with 8 H100 GPUs
    • specs: 32 PFLOPS FP8, 16 PFLOPS FP16, 3.6 TFLOPS in-network compute via SHARP.
    • availability: Not explicitly stated
  • [34:00] DGX H100
    • AI computing system
    • specs: 8 H100 GPUs, 32 PFLOPS AI performance, 640 GB HBM3 memory, 24 TB/s memory bandwidth.
    • availability: Not explicitly stated
  • [34:46] NVLink Switch System
    • External switch to connect multiple DGX nodes
    • specs: Connects up to 32 DGX nodes, enabling 256 GPUs to act as one.
    • availability: Not explicitly stated
  • [36:58] EOS Supercomputer
    • NVIDIA’s internal AI supercomputer
    • specs: 18 DGX Pods, 4608 H100 GPUs, 18.4 EFLOPS AI performance, 275 PFLOPS FP64.
    • availability: Online in a few months
  • [39:00] H100 CNX
    • Converged H100 GPU and ConnectX-7 SmartNIC
    • specs: Direct DMA from network to H100 at 50 GB/s, bypassing CPU bottlenecks.
    • availability: Not explicitly stated
  • [40:37] Grace CPU Superchip
    • Data center CPU designed for AI and HPC
    • specs: 144 CPU cores, 1 TB/s memory bandwidth, SPECrate2017_int_base over 740, 500W power.
    • availability: On track to ship next year (2023)
  • [42:40] NVLink Chip-to-Chip (C2C)
    • High-speed interconnect for custom silicon integration
    • specs: 900 GB/s bandwidth, ultra-energy efficient, low latency.
    • availability: Available to customers and partners
  • [45:56] cuOPT
    • AI-accelerated solver for route optimization
    • specs: Multi-agent, multi-constraint route planning optimization.
    • availability: Not explicitly stated
  • [52:59] NVIDIA OVX Server
    • Computing system designed for Omniverse digital twins
    • specs: 8 A40 GPUs, 3 ConnectX-6 NICs, 2 Intel Ice Lake CPUs, 1 TB system memory.
    • availability: Available now from top computer makers
  • [56:58] Spectrum-4
    • 400G Ethernet Switch
    • specs: 100 Billion transistors, 51.2 Tbps bandwidth, 128 ports of 400GbE.
    • availability: Will sample in Q3
  • [58:43] Omniverse Cloud
    • Cloud-based suite for Omniverse collaboration
    • specs: One-click design collaboration from anywhere without needing local RTX hardware.
    • availability: Not explicitly stated
  • [01:04:00] Isaac Nova Orin
    • Reference AMR (Autonomous Mobile Robot) architecture
    • specs: Powered by Jetson AGX Orin, includes 2 cameras, 2 lidars, 8 ultrasonics, 4 fisheye cameras.
    • availability: Available in Q2
  • [01:07:49] Clara Holoscan MGX
    • Medical-grade platform for robotic medical devices
    • specs: Designed to IEC 62304 standards, powered by Orin and CX7.
    • availability: Early access today, GA in May, medical-grade readiness in Q1 2023
  • [01:08:55] DRIVE Hyperion 9
    • Open reference AV platform
    • specs: Powered by dual Atlan SoCs, 14 cameras, 9 radars, 3 lidars, 20 ultrasonics.
    • availability: For cars shipping starting in 2026
  • [01:09:22] DRIVE Map
    • Multi-modal map engine for autonomous vehicles
    • specs: Includes camera, radar, and lidar layers; auto-generated from ground truth and crowd-sourced data.
    • availability: Expect to map 500,000 km by end of 2024

Specific Numbers (12)

Timestamp Metric Value Context
06:00 Speedup 1,000,000x Increase in computing performance over the past decade due to accelerated computing and ML.
07:28 Developers 3,000,000 Number of developers in the NVIDIA ecosystem.
10:20 Speedup 4 to 5 orders of magnitude How much faster FourCastNet predicts weather compared to classical numerical models.
27:30 Transistors 80 Billion Number of transistors in the H100 GPU.
28:20 Performance 4,000 TFLOPS FP8 performance of the H100 GPU.
31:42 Speedup 40x Speedup for dynamic programming algorithms using new DPX instructions on Hopper.
34:00 Performance 32 PFLOPS AI performance of a single DGX H100 system.
36:58 Performance 18.4 EFLOPS AI performance of the EOS supercomputer.
38:08 Speedup 30x Inference throughput increase of H100 over A100 for large language models.
41:26 Cores 144 Number of CPU cores in the Grace CPU Superchip.
41:26 Performance 740 Estimated SPECrate2017_int_base score for the Grace CPU Superchip.
56:58 Bandwidth 51.2 Tbps Total bandwidth of the Spectrum-4 switch.

Benchmark Claims (4)

  • [29:12] AI Processing (FP8 vs FP16): 6x
    • vs: Ampere A100
    • gain: Hopper H100 delivers 6x the performance of Ampere A100 for AI processing.
  • [38:08] Large Language Model Inference (Megatron 530B): 30x
    • vs: Ampere A100
    • gain: H100 provides 30x higher throughput at 1-second response latency compared to A100.
  • [41:26] CPU Memory Bandwidth: 1 TB/s
    • vs: Top Gen5 CPUs
    • gain: Grace CPU Superchip provides 2 to 3 times the memory bandwidth of top Gen5 CPUs.
  • [41:26] CPU Energy Efficiency: 2x
    • vs: Best CPUs at the time
    • gain: Grace CPU Superchip is twice as energy efficient as the best CPUs.

Customer Stories (4)

  • [22:47] Snap
    • Used NVIDIA Merlin for ad and content recommendations.
    • outcome: Reduced costs by 50% and decreased serving latency by 2x.
  • [22:57] Tencent WeChat
    • Used NVIDIA Merlin for short video recommendations.
    • outcome: Achieved 4x lower latency and 10x throughput, halving costs by moving from CPU to GPU.
  • [01:01:21] Amazon Robotics
    • Used Omniverse to build digital twins of fulfillment centers to train and optimize autonomous robots.
    • outcome: Enabled safer, more efficient inventory movement and optimized warehouse design before physical deployment.
  • [01:04:18] PepsiCo
    • Used Omniverse and Metropolis to create digital twins of distribution centers.
    • outcome: Optimized conveyor belt speeds in real-time, preventing congestion and reducing energy usage.

Key Technologies (4)

  • Transformer Engine: Dynamically processes layers of a transformer network using FP8 and FP16 formats to drastically speed up training without losing accuracy.
  • Hopper Confidential Computing: Protects data and AI models while in use on the GPU, isolating them from the host OS and hypervisor.
  • DPX Instructions: Accelerates dynamic programming algorithms (like Smith-Waterman for genomics) by up to 40x.
  • NVLink Chip-to-Chip (C2C): An ultra-fast, energy-efficient interconnect allowing custom silicon to connect directly to NVIDIA GPUs, CPUs, and DPUs.

Demos Shown (7)

  • [11:03] FourCastNet predicting an atmospheric river.
    • True
  • [14:59] A physically simulated character learning to walk and fight using reinforcement learning.
    • True
  • [19:09] Riva FastPitch generating expressive text-to-speech.
    • True
  • [20:30] Maxine maintaining eye contact and translating speech in real-time video conferencing.
    • True
  • [59:05] Multiple designers collaborating in real-time using Omniverse Cloud.
    • True
  • [01:00:02] Tokkio (Omniverse Avatar) interacting conversationally and answering questions.
    • True
  • [01:10:00] DRIVE Sim reconstructing a real-world driving scenario into a modifiable 3D simulation.
    • True

Predictions / Commitments (5)

  • [36:58, In a few months] EOS supercomputer will be online.
  • [40:37, Next year (2023)] Grace CPU Superchip will ship.
  • [56:58, Q3 2022] Spectrum-4 switch will begin sampling.
  • [01:08:55, Starting in 2026] DRIVE Hyperion 9 will ship in vehicles.
  • [01:09:22, By the end of 2024] DRIVE Map will map 500,000 kilometers of roadways.

Companies Mentioned (4)

TSMC · Intel · BYD · Lucid

Notable Quotes (3)

AI is racing in every direction. New architectures, new learning strategies, larger and more robust models, new science, new applications, new industries. — Jensen Huang @ 06:43

Companies are manufacturing intelligence and operating giant AI factories. — Jensen Huang @ 26:00

A digital twin is a virtual world that’s connected to the physical world. And in the context of the internet, it is the next evolution. — Jensen Huang @ 51:11

Key Topics

Artificial Intelligence · Accelerated Computing · Hopper Architecture · H100 GPU · Grace CPU · Data Center Infrastructure · Omniverse · Digital Twins · Robotics · Autonomous Vehicles · Large Language Models · Transformers · Networking (Infiniband/Ethernet)

Takeaways

  • NVIDIA is transitioning data centers into ‘AI Factories’ designed to manufacture intelligence.
  • The new Hopper architecture and H100 GPU provide massive leaps in performance, specifically tailored for Transformer models.
  • NVIDIA is expanding its silicon footprint beyond GPUs with the Grace CPU Superchip and advanced networking (Spectrum-4, ConnectX-7).
  • Omniverse is positioned as the foundational platform for the next era of the internet, focusing on physically accurate digital twins.
  • Software and SDKs (like Riva, Merlin, and Isaac) are critical to NVIDIA’s strategy, making complex AI accessible across industries.
  • Robotics and autonomous systems are moving from perception to action, heavily relying on simulation (DRIVE Sim, Isaac Sim) before real-world deployment.