O’Reilly Media, Flux Capacitor, 2025. – 365 p. – ISBN 979-8-341-62778-9.
Elevate your AI system performance capabilities with this definitive guide to unlocking peak efficiency across every layer of your AI infrastructure. In today's era of ever-growing generative models, AI Systems Performance Engineering equips professionals with actionable strategies to co-optimize hardware, software, and algorithms for high-performance and cost-effective AI systems. Authored by Chris Fregly, a performance-focused engineering and product leader, this comprehensive resource transforms complex systems into streamlined, high-impact AI solutions.
Preface
Introduction and AI System Overview (available)
AI System Hardware Overview (available)
OS, Docker, and Kubernetes Tuning for GPU-based Environments (available)
Distributed Communication and I/O Optimizations (available)
CUDA Programming, Profiling, and Debugging (unavailable)
Optimizing CUDA Performance (unavailable)
PyTorch Profiling and Tuning (unavailable)
Distributed Training at Ultra‑Scale (unavailable)
Multi-Node Inference Optimizations (unavailable)
AI System Optimization Case Studies (available)
Future Trends in Ultra-Scale AI Systems Performance Engineering (available)
AI Systems Performance Checklist (175+ Items) (available)