GTC China 2020 Keynote
Category: China Keynote · Year: 2020 · ▶ Watch
Speakers: Ashok Pandey - VP, Operations & Partners, APAC, NVIDIA · Bill Dally - Chief Scientist and SVP of Research, NVIDIA · Greg Estes - VP, Corporate Marketing & Developer Programs, NVIDIA · Jay Puri - EVP, Worldwide Field Operations, NVIDIA · Kimberly Powell - VP, Healthcare, NVIDIA · Raymond Teh - VP, Sales & Marketing, APAC, NVIDIA
Segments (15)
- 00:00 · Introduction
- Opening video highlighting NVIDIA’s impact across various industries.
- 03:14 · Keynote: Ampere Architecture and Software Stack
- Bill Dally introduces the Ampere A100 GPU, its new features like TF32 and structured sparsity, and the CUDA software ecosystem.
- 09:21 · Keynote: DGX Systems and Supercomputing
- Overview of DGX A100, DGX SuperPOD, and the Selene supercomputer’s ranking on the Top500 and Green500 lists.
- 11:56 · Keynote: Deep Learning Performance and MLPerf
- Discussion on the evolution of Tensor Cores, Huang’s Law, and NVIDIA’s dominance in MLPerf training and inference benchmarks.
- 17:28 · Keynote: Real-Time Graphics and Ray Tracing
- Showcase of RTX DI, RTX GI, and DLSS 2.0 enabling photorealistic real-time rendering.
- 25:56 · Keynote: AI Applications - GANs, NLP, and Recommenders
- Exploration of Generative Adversarial Networks (GANs), conversational AI with Jarvis, Megatron NLP, and the Merlin recommender framework.
- 35:10 · Keynote: AI in Healthcare
- Introduction to Clara Discovery for drug discovery, genomics with Parabricks, and AI’s role in fighting COVID-19.
- 42:59 · Keynote: Robotics and Autonomous Vehicles
- Advancements in robotic manipulation, reinforcement learning in simulation, and the NVIDIA DRIVE platform for autonomous vehicles.
- 50:18 · Keynote: NVIDIA Research Projects
- Deep dive into future technologies including efficient inference accelerators (RC18, MAGNet), silicon photonics for interconnects, and the Legate programming system.
- 01:01:00 · Executive Panel: Introduction
- Raymond Teh introduces the executive panel to discuss NVIDIA’s business and strategy in China.
- 01:10:59 · Panel: Importance of the China Market
- Jay Puri and Greg Estes discuss the strategic importance of China, its massive developer base, and the gaming ecosystem.
- 01:16:45 · Panel: AI in Healthcare and COVID-19 Response
- Kimberly Powell explains how AI and accelerated computing are creating a ‘computational global defense system’ for healthcare.
- 01:26:59 · Panel: Cloud Service Providers and Live Streaming
- Ashok Pandey details collaborations with Chinese CSPs (Alibaba, Tencent, Baidu) and the use of GPUs in the booming live streaming industry.
- 01:46:49 · Panel: Startups and the Inception Program
- Greg Estes highlights NVIDIA’s support for over 800 AI startups in China through the Inception program.
- 01:50:30 · Panel: DGX Strategy and Partner Ecosystem
- Jay Puri clarifies the strategy behind NVIDIA’s DGX systems and how they enable OEM partners to build certified AI platforms.
Product Announcements (8)
- [03:42] Ampere A100 GPU
- Data center GPU architecture
- specs: 7nm, 54 billion transistors, 3rd Gen Tensor Cores, TF32 support, Multi-Instance GPU (MIG), Structured Sparsity
- availability: Available
- [09:21] DGX A100
- AI system
- specs: 8x A100 GPUs, 9x Mellanox ConnectX-6 NICs, 160 Teraflops FP64
- availability: Available
- [19:14] RTX DI and RTX GI
- Rendering technologies
- specs: Direct Illumination using ReSTIR algorithm, Global Illumination using light probes for real-time path tracing
- availability: Available in NVIDIA graphics products
- [21:29] DLSS 2.0
- Deep Learning Super Sampling
- specs: AI-driven upscaling, temporally stable, generalized network across games
- availability: Available
- [31:31] NVIDIA Jarvis
- Multimodal conversational AI service
- specs: Speech-to-text, NLP, text-to-speech pipeline
- availability: Available
- [35:50] Triton Inference Server
- Open-source inference serving software
- specs: Supports multiple frameworks (TensorFlow, PyTorch, ONNX), dynamic batching, concurrent model execution
- availability: Available
- [38:00] Clara Discovery
- Computational drug discovery platform
- specs: Genomics (Parabricks), Cryo-EM (CryoSPARC), molecular docking (AutoDock), NLP (BioMegatron)
- availability: Available
- [49:20] DRIVE AGX Orin
- Autonomous vehicle compute platform
- specs: Scalable from 10 TOPS (5W) for ADAS to 2,000 TOPS (800W) for Level 5 Robotaxi
- availability: Announced
Specific Numbers (9)
| Timestamp | Metric | Value | Context |
|---|---|---|---|
| 05:51 | Transistors | 54 billion | Number of transistors on the Ampere A100 chip. |
| 07:28 | TFLOPS | 19.5 | FP64 Tensor Core performance on A100. |
| 07:36 | TFLOPS | 156 | TF32 Tensor Core performance on A100 for deep learning training. |
| 07:45 | PETAOPS | 1.25 | INT8 inference performance on A100 with sparsity. |
| 10:26 | Ranking | #5 | Selene supercomputer ranking on the Top500 list. |
| 14:18 | Performance Multiplier | 317x | Increase in single-chip inference performance over 8 years (Huang’s Law). |
| 01:11:15 | Developers | 400,000+ | Registered NVIDIA developers in China. |
| 01:11:59 | CPUs Sold | 22 billion | Number of ARM CPUs sold annually. |
| 01:47:38 | Startups | 800+ | Number of startups in the NVIDIA Inception program in China. |
Benchmark Claims (3)
- [15:10] MLPerf Training: Up to 2.5x
- vs: Volta V100
- gain: A100 is up to 2.5x faster than V100 in training benchmarks, sweeping all categories.
- [16:10] MLPerf Data Center Inference: Up to 237x
- vs: CPU
- gain: A100 is up to 237x faster than CPU and 6-8x faster than the previous generation T4.
- [17:00] MLPerf Edge Inference: Leading
- vs: Centaur
- gain: Jetson AGX Xavier and T4 sweep categories, outperforming competitors like Centaur.
Customer Stories (4)
- [01:18:18] Ping An, United Imaging, Infervision
- Deployed Clara medical imaging COVID AI technology into thousands of hospitals across China.
- outcome: Provided frontline workers with AI tools to make better choices and treat patients faster.
- [01:28:40] Alibaba Cloud, Tencent Cloud, Baidu Cloud
- Adopted the A100 GPU architecture for their cloud services.
- outcome: Achieved significant performance-to-price improvements and supported complex AI models.
- [01:44:50] Taobao
- Used GPUs to accelerate computer vision and NLP during live streams.
- outcome: Improved real-time content understanding and user experience.
- [01:45:00] Bigo Live
- Used GPUs to improve real-time content understanding and creation capabilities.
- outcome: Enhanced live streaming features.
Key Technologies (6)
- TensorFloat-32 (TF32): A new math format that provides the range of FP32 and the precision of FP16, accelerating AI training without code changes.
- Structured Sparsity: Allows 2 out of 4 weights in a neural network to be zero, doubling math throughput and reducing memory bandwidth requirements.
- RTX Direct Illumination (RTX DI): Uses ReSTIR algorithm to render millions of dynamic lights with physically accurate shadows in real-time.
- RTX Global Illumination (RTX GI): Computes infinite bounces of indirect light using light probes without light leaks, enabling real-time global illumination.
- DLSS 2.0: Uses a deep neural network to upscale lower-resolution rendered images to higher resolutions (e.g., 4K) while maintaining temporal stability.
- Silicon Photonics: Uses light instead of electrical signals for chip-to-chip communication, offering higher bandwidth and longer reach at lower power.
Demos Shown (6)
- [18:19] Marbles RTX tech demo showcasing real-time path tracing, soft shadows, and reflections.
- True
- [23:00] DLSS 2.0 comparison in Death Stranding, showing native 4K vs DLSS 4K.
- True
- [30:09] Maxine video conferencing demo using GANs to animate a face from keypoints, including mapping to a cartoon avatar.
- True
- [32:20] GauGAN demo turning simple painted shapes into photorealistic landscapes.
- True
- [43:30] Robotic arm using Riemannian Motion Policies to avoid obstacles and grasp unknown objects.
- True
- [45:00] Four-legged robots learning to walk in simulation and transferring that skill to the real world.
- True
Predictions / Commitments (3)
- [25:11, Long term] In the long run, we expect computer graphics to be generated by AI… without ever having geometry.
- [50:41, Future generations] We are looking at an alternative technology to actually signal out of our GPUs… using light, using photonics.
- [54:23, Ongoing] We are continuing this evolution of Huang’s Law, continuing to more than double inference performance each year.
Companies Mentioned (6)
Google · Huawei · Intel · Xilinx · ARM · Alibaba, Tencent, Baidu
Notable Quotes (3)
This curve has come to be known as Huang’s Law, which is that inference performance doubles every year. Actually, we’re more than doubling it every year. — Bill Dally @ 14:23
The future of graphics is AI. In fact, the future of almost everything is AI. — Bill Dally @ 25:55
It’s absolutely what I call the perfect storm for a computational global defense system. — Kimberly Powell @ 01:17:34
Key Topics
Ampere Architecture · Deep Learning Inference · Ray Tracing · Generative AI · Healthcare & Drug Discovery · Robotics · Autonomous Vehicles · Silicon Photonics · China Market Strategy · ARM Acquisition · Cloud Computing · Startup Ecosystem
Takeaways
- The Ampere A100 GPU delivers massive performance leaps for both AI training and inference, driven by TF32 and structured sparsity.
- NVIDIA is outpacing Moore’s Law with ‘Huang’s Law’, achieving a 317x increase in inference performance over 8 years through architectural innovations.
- AI is fundamentally transforming computer graphics, enabling real-time path tracing and AI-driven upscaling (DLSS), with a future where graphics are entirely AI-generated.
- The COVID-19 pandemic has accelerated the adoption of AI in healthcare, creating a ‘computational global defense system’ for drug discovery and medical imaging.
- NVIDIA is heavily investing in future technologies like silicon photonics to overcome electrical bandwidth limitations in data center interconnects.
- The China market is highly strategic for NVIDIA, supported by deep partnerships with major Cloud Service Providers (Alibaba, Tencent, Baidu) and a massive developer base.
- NVIDIA’s planned acquisition of ARM aims to bring ARM’s energy-efficient architecture into the data center, creating a viable alternative to x86.