GTC Taiwan Jensen Keynote
Category: Taiwan Keynote · Year: 2018 · ▶ Watch
Segments (22)
- 00:00 · Introduction
- Jensen Huang welcomes the audience to GTC Taiwan 2018.
- 01:06 · The Rise of GPU Computing
- Discussion on the end of Moore’s Law and the necessity of GPU-accelerated computing.
- 04:08 · The Computing Gap
- Highlighting the massive future demand for computing power that CPUs alone cannot meet.
- 06:50 · NVIDIA Accelerated Computing Stack
- Explanation of NVIDIA’s full-stack optimization approach from architecture to applications.
- 13:20 · GPU-Accelerated HPC Cluster
- Comparing traditional CPU clusters to GPU clusters in cost, space, and power.
- 19:40 · AI Training Demand
- Showcasing the exponential growth in compute required for training neural networks.
- 23:20 · The Tensor Core GPU
- Introduction to the Volta architecture and its fusion of HPC and AI computing.
- 27:20 · NVSwitch
- Unveiling the high-speed interconnect that allows multiple GPUs to act as one.
- 29:30 · DGX-2 Announcement
- Introduction of the DGX-2, the world’s largest GPU.
- 35:20 · DGX-2 Physical Reveal
- Jensen Huang physically unveils the 350-pound DGX-2 system on stage.
- 37:30 · 10X Performance in 6 Months
- Demonstrating the rapid performance gains achieved through full-stack optimization.
- 44:30 · 5 Speed Records
- Highlighting record-setting AI training and inference performance metrics.
- 46:30 · AI Inference & TensorRT 4
- Focusing on the challenges of AI inference and the introduction of TensorRT 4.
- 57:00 · Kubernetes on NVIDIA GPUs
- Announcing GPU support for Kubernetes to scale out AI workloads.
- 01:01:00 · PLASTER Framework
- Introducing a framework for evaluating inference performance.
- 01:10:00 · Inference Demos
- Live demonstrations of image recognition and scale-out inference.
- 01:20:00 · HGX-2 Announcement
- Unveiling the HGX-2 cloud server platform for hyperscale data centers.
- 01:30:00 · NVIDIA RTX
- Introducing real-time ray tracing technology for computer graphics.
- 01:43:00 · NVIDIA Clara
- Announcing the Clara medical imaging supercomputer platform.
- 01:56:00 · NVIDIA Metropolis
- Discussing AI applications for smart and safe cities.
- 02:01:00 · NVIDIA DRIVE & Autonomous Vehicles
- Overview of the end-to-end platform for autonomous driving.
- 02:06:00 · Project We-kanda Demo
- A live VR telepresence driving demonstration.
Product Announcements (6)
- [29:30] DGX-2
- The world’s largest GPU system, combining 16 Volta GPUs.
- specs: 2 PFLOPS, 512GB HBM2 memory, 10kW power, 350 lbs.
- availability: $399,000, available in Q3.
- [46:30] TensorRT 4
- An optimizing compiler for deep learning inference.
- specs: Integrates with TensorFlow, ONNX, and accelerates various network types.
- availability: Not explicitly stated.
- [57:00] Kubernetes on NVIDIA GPUs
- Container orchestration support for NVIDIA GPUs.
- specs: Allows scaling out AI workloads across data centers and clouds.
- availability: Not explicitly stated.
- [01:20:00] HGX-2
- A cloud server platform baseboard.
- specs: Fuses HPC and AI computing, 2 PFLOPS, uses NVSwitch.
- availability: Not explicitly stated.
- [01:30:00] NVIDIA RTX
- Real-time ray tracing technology.
- specs: Combines programmable shading, ray tracing, and AI.
- availability: Not explicitly stated.
- [01:43:00] NVIDIA Clara
- A medical imaging supercomputer platform.
- specs: Virtualizes medical imaging instruments, uses iterative reconstruction and AI.
- availability: Not explicitly stated.
Specific Numbers (8)
| Timestamp | Metric | Value | Context |
|---|---|---|---|
| 01:25 | Performance advance | 100,000x | CPU performance advancement over 25 years before Moore’s Law slowed. |
| 02:50 | CUDA Developers | 850,000 | Number of CUDA developers globally. |
| 05:45 | Computing Demand | 1,000 Exaflops | Estimated computing demand by the year 2028. |
| 20:37 | Compute Demand Increase | 300,000x | Increase in compute required for AI training over a 5-year period (OpenAI data). |
| 33:45 | Performance | 2 PFLOPS | Computing power of the DGX-2 system. |
| 39:31 | Price | $399,000 | Cost of the DGX-2 system. |
| 45:00 | Training Speed | 15,500 images/sec | DGX-2 record for ResNet-50 training. |
| 45:20 | Inference Latency | 1.1 milliseconds | Record latency for ResNet-50 inference. |
Benchmark Claims (3)
- [40:30] DGX-2 vs Traditional Hyperscale Cluster: 1 DGX-2
- vs: 300 Dual-CPU Servers
- gain: 1/8 the cost, 1/60 the space, 1/18 the power.
- [37:30] DGX-2 vs DGX-1 Training Time: 1.5 days
- vs: 15 days
- gain: 10x faster training in just 6 months of stack optimization.
- [01:08:00] TensorRT 4 Inference Speedup: Up to 190x
- vs: CPU-only inference
- gain: 190x for Image/Video, 50x for NLP, 45x for Recommender systems.
Customer Stories (2)
- [15:25] Quantum Chemist
- Used CUDA on consumer GeForce GPUs to run quantum chemistry simulations.
- outcome: Achieved massive speedups, allowing him to do his life’s work in his lifetime, describing it as a ‘time machine’.
- [19:57] OpenAI
- Measured the amount of computation necessary to train state-of-the-art neural networks.
- outcome: Found a 300,000x increase in compute demand over 5 years.
Key Technologies (6)
- CUDA: NVIDIA’s parallel computing platform and programming model.
- Tensor Core: A specialized core that fuses HPC and AI computing, performing mixed-precision matrix math.
- NVSwitch: A high-speed interconnect switch that allows multiple GPUs to communicate at massive bandwidth.
- TensorRT: An optimizing compiler and runtime for deep learning inference.
- Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications.
- RTX: NVIDIA’s technology for real-time ray tracing, combining rasterization, ray tracing, and AI.
Demos Shown (5)
- [01:10:00] Flower image recognition inference comparing CPU vs GPU performance.
- True
- [01:16:00] Scale-out AI inference using Kubernetes to dynamically add GPU nodes to handle increased load.
- True
- [01:38:00] Star Wars Reflections demo showcasing real-time ray tracing using RTX technology.
- True
- [01:46:00] Clara medical imaging demo comparing CPU vs GPU iterative reconstruction of a CT scan.
- True
- [02:01:00] Project We-kanda: A VR telepresence demo driving a miniature car and a real car remotely.
- True
Predictions / Commitments (3)
- [04:57, 10 years] In the next 10 years, computing demand will be faster than 100 times.
- [16:45, Future] Every single supercomputer in the future will be accelerated.
- [01:51:50, Future] Everything that moves will be autonomous.
Companies Mentioned (5)
TSMC · Google (TensorFlow) · Quanta, Wistron, Foxconn, Inventec · Epic Games, ILM · GE Healthcare, Philips, Siemens, Canon
Notable Quotes (3)
The more you buy, the more you save. — Jensen Huang @ 14:38
We created for him a time machine. — Jensen Huang @ 16:14
There is a new law in town… if you optimize across the entire stack, the performance improvement you can achieve is incredibly fast. — Jensen Huang @ 38:12
Key Topics
GPU Computing · Moore's Law · Supercomputing · Deep Learning Training · Deep Learning Inference · Tensor Core · NVSwitch · DGX-2 · HGX-2 · TensorRT · Kubernetes · Real-time Ray Tracing · Medical Imaging · Autonomous Vehicles
Takeaways
- CPU scaling has stalled, making GPU-accelerated computing essential for future performance gains.
- NVIDIA is optimizing across the entire computing stack (chips, systems, software, algorithms) to deliver exponential speedups.
- The DGX-2, powered by NVSwitch, acts as a single giant GPU to tackle massive AI training workloads.
- TensorRT 4 and Kubernetes integration make NVIDIA GPUs highly efficient and scalable for AI inference in data centers.
- NVIDIA RTX brings real-time ray tracing to computer graphics, revolutionizing content creation and gaming.
- NVIDIA’s platforms are expanding into vertical domains like medical imaging (Clara) and autonomous machines (DRIVE).