CppCon 25 Matrix Multiplication Deep Dive || Cache Blocking, SIMD & Parallelization -- Aliaksei Sala
Registration is now open for CppCon 2026! The conference starts on September 12 and will be held in person in Aurora, CO. To whet your appetite for this year’s conference, we’re posting videos of some of the top-rated talks from last year's conference. Here’s another CppCon talk video we hope you will enjoy – and why not register today for CppCon 2026!
Matrix Multiplication Deep Dive || Cache Blocking, SIMD & Parallelization
by Aliaksei Sala
Summary of the talk:
Matrix multiplication is a fundamental operation in scientific computing, game development, AI, and numerous high-performance applications. While its mathematical definition is simple, achieving optimal performance in C++ is far from trivial.
In this talk, we will explore different optimization techniques for matrix multiplication, from naive implementations to highly tuned versions leveraging modern hardware features. We will cover key performance-enhancing strategies such as loop unrolling, cache blocking, SIMD vectorization, parallelization using threads and more. Through benchmarking and profiling, we will measure the real impact of these optimizations.
By the end of this session, attendees will gain insights into two critical questions:
How hard is it to implement an optimized matrix multiplication in C++? How effective is C++ for achieving peak performance in this task?
This talk is suitable for developers interested in performance optimization, computational efficiency, and modern C++ techniques for numerical computing.

In algorithmic trading, the Python-vs-C++ debate is usually framed as flexibility versus speed — rapid strategy development on one side, ultra-low-latency execution on the other. But with C++26 reflection, that trade-off starts to disappear, making it possible to generate Python bindings automatically while keeping the core logic running at native C++ performance.
Registration is now open for CppCon 2026! The conference starts on September 12 and will be held
Registration is now open for CppCon 2026! The conference starts on September 12 and will be held
Registration is now open for CppCon 2026! The conference starts on September 12 and will be held
Function calls are cheap — but they are not free — and in tight loops their cost can dominate your runtime. Modern compilers rely on inlining to remove that overhead and unlock deeper optimizations, sometimes turning an ordinary loop into dramatically faster SIMD code.