GTC China 2020 Keynote

Category: China Keynote · Year: 2020 · ▶ Watch

Speakers: Ashok Pandey - VP, Operations & Partners, APAC, NVIDIA · Bill Dally - Chief Scientist and SVP of Research, NVIDIA · Greg Estes - VP, Corporate Marketing & Developer Programs, NVIDIA · Jay Puri - EVP, Worldwide Field Operations, NVIDIA · Kimberly Powell - VP, Healthcare, NVIDIA · Raymond Teh - VP, Sales & Marketing, APAC, NVIDIA

Switch language → 中文

Segments (15)

00:00 · Introduction
- Opening video highlighting NVIDIA’s impact across various industries.
03:14 · Keynote: Ampere Architecture and Software Stack
- Bill Dally introduces the Ampere A100 GPU, its new features like TF32 and structured sparsity, and the CUDA software ecosystem.
09:21 · Keynote: DGX Systems and Supercomputing
- Overview of DGX A100, DGX SuperPOD, and the Selene supercomputer’s ranking on the Top500 and Green500 lists.
11:56 · Keynote: Deep Learning Performance and MLPerf
- Discussion on the evolution of Tensor Cores, Huang’s Law, and NVIDIA’s dominance in MLPerf training and inference benchmarks.
17:28 · Keynote: Real-Time Graphics and Ray Tracing
- Showcase of RTX DI, RTX GI, and DLSS 2.0 enabling photorealistic real-time rendering.
25:56 · Keynote: AI Applications - GANs, NLP, and Recommenders
- Exploration of Generative Adversarial Networks (GANs), conversational AI with Jarvis, Megatron NLP, and the Merlin recommender framework.
35:10 · Keynote: AI in Healthcare
- Introduction to Clara Discovery for drug discovery, genomics with Parabricks, and AI’s role in fighting COVID-19.
42:59 · Keynote: Robotics and Autonomous Vehicles
- Advancements in robotic manipulation, reinforcement learning in simulation, and the NVIDIA DRIVE platform for autonomous vehicles.
50:18 · Keynote: NVIDIA Research Projects
- Deep dive into future technologies including efficient inference accelerators (RC18, MAGNet), silicon photonics for interconnects, and the Legate programming system.
01:01:00 · Executive Panel: Introduction
- Raymond Teh introduces the executive panel to discuss NVIDIA’s business and strategy in China.
01:10:59 · Panel: Importance of the China Market
- Jay Puri and Greg Estes discuss the strategic importance of China, its massive developer base, and the gaming ecosystem.
01:16:45 · Panel: AI in Healthcare and COVID-19 Response
- Kimberly Powell explains how AI and accelerated computing are creating a ‘computational global defense system’ for healthcare.
01:26:59 · Panel: Cloud Service Providers and Live Streaming
- Ashok Pandey details collaborations with Chinese CSPs (Alibaba, Tencent, Baidu) and the use of GPUs in the booming live streaming industry.
01:46:49 · Panel: Startups and the Inception Program
- Greg Estes highlights NVIDIA’s support for over 800 AI startups in China through the Inception program.
01:50:30 · Panel: DGX Strategy and Partner Ecosystem
- Jay Puri clarifies the strategy behind NVIDIA’s DGX systems and how they enable OEM partners to build certified AI platforms.

Product Announcements (8)

[03:42] Ampere A100 GPU
- Data center GPU architecture
- specs: 7nm, 54 billion transistors, 3rd Gen Tensor Cores, TF32 support, Multi-Instance GPU (MIG), Structured Sparsity
- availability: Available
[09:21] DGX A100
- AI system
- specs: 8x A100 GPUs, 9x Mellanox ConnectX-6 NICs, 160 Teraflops FP64
- availability: Available
[19:14] RTX DI and RTX GI
- Rendering technologies
- specs: Direct Illumination using ReSTIR algorithm, Global Illumination using light probes for real-time path tracing
- availability: Available in NVIDIA graphics products
[21:29] DLSS 2.0
- Deep Learning Super Sampling
- specs: AI-driven upscaling, temporally stable, generalized network across games
- availability: Available
[31:31] NVIDIA Jarvis
- Multimodal conversational AI service
- specs: Speech-to-text, NLP, text-to-speech pipeline
- availability: Available
[35:50] Triton Inference Server
- Open-source inference serving software
- specs: Supports multiple frameworks (TensorFlow, PyTorch, ONNX), dynamic batching, concurrent model execution
- availability: Available
[38:00] Clara Discovery
- Computational drug discovery platform
- specs: Genomics (Parabricks), Cryo-EM (CryoSPARC), molecular docking (AutoDock), NLP (BioMegatron)
- availability: Available
[49:20] DRIVE AGX Orin
- Autonomous vehicle compute platform
- specs: Scalable from 10 TOPS (5W) for ADAS to 2,000 TOPS (800W) for Level 5 Robotaxi
- availability: Announced

Specific Numbers (9)

Timestamp	Metric	Value	Context
05:51	Transistors	54 billion	Number of transistors on the Ampere A100 chip.
07:28	TFLOPS	19.5	FP64 Tensor Core performance on A100.
07:36	TFLOPS	156	TF32 Tensor Core performance on A100 for deep learning training.
07:45	PETAOPS	1.25	INT8 inference performance on A100 with sparsity.
10:26	Ranking	#5	Selene supercomputer ranking on the Top500 list.
14:18	Performance Multiplier	317x	Increase in single-chip inference performance over 8 years (Huang’s Law).
01:11:15	Developers	400,000+	Registered NVIDIA developers in China.
01:11:59	CPUs Sold	22 billion	Number of ARM CPUs sold annually.
01:47:38	Startups	800+	Number of startups in the NVIDIA Inception program in China.

Benchmark Claims (3)

[15:10] MLPerf Training: Up to 2.5x
- vs: Volta V100
- gain: A100 is up to 2.5x faster than V100 in training benchmarks, sweeping all categories.
[16:10] MLPerf Data Center Inference: Up to 237x
- vs: CPU
- gain: A100 is up to 237x faster than CPU and 6-8x faster than the previous generation T4.
[17:00] MLPerf Edge Inference: Leading
- vs: Centaur
- gain: Jetson AGX Xavier and T4 sweep categories, outperforming competitors like Centaur.

Customer Stories (4)

[01:18:18] Ping An, United Imaging, Infervision
- Deployed Clara medical imaging COVID AI technology into thousands of hospitals across China.
- outcome: Provided frontline workers with AI tools to make better choices and treat patients faster.
[01:28:40] Alibaba Cloud, Tencent Cloud, Baidu Cloud
- Adopted the A100 GPU architecture for their cloud services.
- outcome: Achieved significant performance-to-price improvements and supported complex AI models.
[01:44:50] Taobao
- Used GPUs to accelerate computer vision and NLP during live streams.
- outcome: Improved real-time content understanding and user experience.
[01:45:00] Bigo Live
- Used GPUs to improve real-time content understanding and creation capabilities.
- outcome: Enhanced live streaming features.

Key Technologies (6)

TensorFloat-32 (TF32): A new math format that provides the range of FP32 and the precision of FP16, accelerating AI training without code changes.
Structured Sparsity: Allows 2 out of 4 weights in a neural network to be zero, doubling math throughput and reducing memory bandwidth requirements.
RTX Direct Illumination (RTX DI): Uses ReSTIR algorithm to render millions of dynamic lights with physically accurate shadows in real-time.
RTX Global Illumination (RTX GI): Computes infinite bounces of indirect light using light probes without light leaks, enabling real-time global illumination.
DLSS 2.0: Uses a deep neural network to upscale lower-resolution rendered images to higher resolutions (e.g., 4K) while maintaining temporal stability.
Silicon Photonics: Uses light instead of electrical signals for chip-to-chip communication, offering higher bandwidth and longer reach at lower power.

Demos Shown (6)

[18:19] Marbles RTX tech demo showcasing real-time path tracing, soft shadows, and reflections.
- True
[23:00] DLSS 2.0 comparison in Death Stranding, showing native 4K vs DLSS 4K.
- True
[30:09] Maxine video conferencing demo using GANs to animate a face from keypoints, including mapping to a cartoon avatar.
- True
[32:20] GauGAN demo turning simple painted shapes into photorealistic landscapes.
- True
[43:30] Robotic arm using Riemannian Motion Policies to avoid obstacles and grasp unknown objects.
- True
[45:00] Four-legged robots learning to walk in simulation and transferring that skill to the real world.
- True

Predictions / Commitments (3)

[25:11, Long term] In the long run, we expect computer graphics to be generated by AI… without ever having geometry.
[50:41, Future generations] We are looking at an alternative technology to actually signal out of our GPUs… using light, using photonics.
[54:23, Ongoing] We are continuing this evolution of Huang’s Law, continuing to more than double inference performance each year.

Companies Mentioned (6)

Google · Huawei · Intel · Xilinx · ARM · Alibaba, Tencent, Baidu

Notable Quotes (3)

This curve has come to be known as Huang’s Law, which is that inference performance doubles every year. Actually, we’re more than doubling it every year. — Bill Dally @ 14:23

The future of graphics is AI. In fact, the future of almost everything is AI. — Bill Dally @ 25:55

It’s absolutely what I call the perfect storm for a computational global defense system. — Kimberly Powell @ 01:17:34

Key Topics

Ampere Architecture · Deep Learning Inference · Ray Tracing · Generative AI · Healthcare & Drug Discovery · Robotics · Autonomous Vehicles · Silicon Photonics · China Market Strategy · ARM Acquisition · Cloud Computing · Startup Ecosystem

Takeaways

The Ampere A100 GPU delivers massive performance leaps for both AI training and inference, driven by TF32 and structured sparsity.
NVIDIA is outpacing Moore’s Law with ‘Huang’s Law’, achieving a 317x increase in inference performance over 8 years through architectural innovations.
AI is fundamentally transforming computer graphics, enabling real-time path tracing and AI-driven upscaling (DLSS), with a future where graphics are entirely AI-generated.
The COVID-19 pandemic has accelerated the adoption of AI in healthcare, creating a ‘computational global defense system’ for drug discovery and medical imaging.
NVIDIA is heavily investing in future technologies like silicon photonics to overcome electrical bandwidth limitations in data center interconnects.
The China market is highly strategic for NVIDIA, supported by deep partnerships with major Cloud Service Providers (Alibaba, Tencent, Baidu) and a massive developer base.
NVIDIA’s planned acquisition of ARM aims to bring ARM’s energy-efficient architecture into the data center, creating a viable alternative to x86.