Stripgay
📖 Tutorial

When Specs Aren't Enough: The Clash Between Linux Kernel's Restartable Sequences and Google's TCMalloc

Last updated: 2026-05-02 23:37:09 Intermediate
Complete guide
Follow along with this comprehensive guide

Hyrum's Law, a well-known adage in software engineering, states that any observable behavior of a system—no matter how incidental—will eventually become a dependency for someone. This principle is currently on full display in the Linux kernel community, where a recent performance improvement to the restartable sequences mechanism has inadvertently clashed with Google's TCMalloc memory allocator. While the kernel update adhered strictly to the documented API, TCMalloc's actual usage deviated from those specifications, causing breakage and forcing a delicate balancing act between progress and backward compatibility.

Understanding Restartable Sequences

Restartable sequences are a Linux kernel feature introduced to improve performance in user-space concurrency primitives. They allow critical sections to execute without atomic operations or locking in the common case, by automatically restarting if a context switch occurs mid-operation. This design minimizes overhead for operations like per-CPU data updates, making them highly attractive for performance-sensitive libraries.

When Specs Aren't Enough: The Clash Between Linux Kernel's Restartable Sequences and Google's TCMalloc

The API defined for restartable sequences is intentionally narrow. It specifies that each sequence must be registered with the kernel and must not interfere with other sequences. Additionally, the kernel guarantees that a restartable sequence will only be interrupted at certain safe points, and after restart, the sequence begins from the start. These constraints ensure predictable behavior across different users.

TCMalloc's Divergent Implementation

Google's TCMalloc, a widely used memory allocator, leverages restartable sequences to accelerate thread-local caching. However, internal audits revealed that TCMalloc's usage stretched beyond the documented API. Specifically, it relied on behaviors not guaranteed by the kernel, such as assuming that restart events would not occur under certain workloads or that the restart point would always align with particular memory states. These assumptions, while functional in earlier kernel versions, were never part of the official contract.

This is a classic case of Hyrum's Law in action. TCMalloc's developers depended on observable behavior—the fact that restart events were rare or predictable—even though the kernel never promised such guarantees. As long as no changes disturbed this equilibrium, everything worked smoothly.

The Catalyst: Linux 6.19's Performance Changes

In the Linux 6.19 release cycle, kernel developers introduced optimizations to restartable sequences to address performance bottlenecks identified in high-throughput systems. These changes altered the timing and frequency of restarts, but did not modify the documented API. From the kernel's perspective, the interface remained intact.

However, TCMalloc's implicit dependencies were broken. The new restart patterns triggered cases where TCMalloc's internal state became inconsistent, leading to crashes or subtle data corruption. The library, which had previously worked flawlessly, now failed—despite the kernel's adherence to its written specification.

The No-Regressions Rule: A Developer's Dilemma

The Linux kernel project enforces a strict no-regressions rule: no change may cause previously working user-space code to break. This policy is designed to maintain stability for the vast ecosystem of applications and libraries. Faced with the TCMalloc failure, kernel developers had two unpalatable options: revert the performance improvement (losing gains for other users) or force TCMalloc to conform to the API (breaking Google's software and potentially millions of deployments).

Neither option was acceptable. Instead, the community is exploring a middle ground: extending the restartable sequences API to allow TCMalloc to explicitly request the old behavior, or adding a compatibility mode that preserves the incidental behaviors TCMalloc relied on. This approach respects both Hyrum's Law and the no-regressions rule, but it introduces complexity and maintenance burden.

Lessons from Hyrum's Law in Practice

This incident underscores several important lessons for API designers and library developers:

  • Documentation is not enough. Even the clearest API spec cannot prevent users from relying on unstated behaviors. Mechanism designers must anticipate that any observable side effect may become a de facto part of the interface.
  • Testing matters. TCMalloc's test suite likely did not cover the exact restart patterns introduced in 6.19. Comprehensive testing against a range of kernel behaviors could have caught the dependency earlier.
  • Flexibility in specification. When possible, APIs should be designed to allow future changes without breaking users. For restartable sequences, this might mean explicitly documenting limitations and negotiating guarantees.

The kernel community's response—seeking a compromise rather than insisting on strict conformance—shows wisdom. It acknowledges that the real-world contract between kernel and user-space is shaped by both the written API and the history of observed behavior. As Hyrum's Law reminds us, every detail matters.

Looking Ahead

As of now, the discussion is ongoing. A preliminary patch set proposes adding a new registration flag that tells the kernel to use the old restart heuristics, allowing TCMalloc to opt into backward compatibility. Once merged, this will restore functionality while keeping the performance optimizations for other users. However, it sets a precedent: future adjustments to restartable sequences may need to account for a growing set of legacy behaviors.

For the broader open-source ecosystem, this episode is a vivid example of how even the most well-intentioned improvements can run afoul of real-world dependencies. It reinforces the need for careful communication between kernel maintainers and downstream projects, and for tolerance in the face of Hyrum's Law.