Speeding up C++ functions with a thread_local cache -- Daniel Lemire
When working with legacy or rigid codebases, performance bottlenecks can emerge from designs you can’t easily change—like interfaces that force inefficient map access by index. This article explores how a simple thread_local cache can dramatically improve performance in such cases, reducing repeated lookups from quadratic to near-constant time.
Speeding up C++ functions with a thread_local cache
by Daniel Lemire
From the article:
In large code bases, we are often stuck with unpleasant designs that are harming our performance. We might be looking for a non-intrusive method to improve the performance. For example, you may not want to change the function signatures.
Let us consider a concrete example. Maybe someone designed the programming interface so that you have to access the values from a map using an index.

A missing
In this final part of the tuple-iteration mini-series, we move beyond C++20 and C++23 techniques to explore how C++26 finally brings first-class language support for compile-time iteration. With structured binding packs (P1061) and expansion statements (P1306), what once required clever template tricks can now be written in clean, expressive, modern C++.
std::format allows us to format values quickly and safely. Spencer Collyer demonstrates how to provide formatting for a simple user-defined class.
Modern C++ offers elegant abstractions like std::ranges that promise cleaner, more expressive code without sacrificing speed. Yet, as with many abstractions, real-world performance can tell a more nuanced story—one that every engineer should verify through careful benchmarking.