Linux & DevOps

Accelerating Linux Page Migration with AMD’s New Batch Copy Patches: A Developer’s Guide

2026-05-01 02:02:53

Overview

Page migration is a critical operation in modern Linux memory management, especially in systems with heterogeneous memory (e.g., NUMA nodes, disaggregated memory, or CXL-attached devices). When a process accesses memory from a remote node, performance degrades due to latency. Migration moves pages to the accessing node, but traditional single-page migrations are CPU-intensive and cause high overhead. The new patch series, initially proposed by a NVIDIA engineer in early 2025 and now advanced by AMD engineers, introduces accelerated page migration using batch copies and hardware offloading. This guide will help you understand, apply, and test these patches to boost system performance.

Accelerating Linux Page Migration with AMD’s New Batch Copy Patches: A Developer’s Guide

Prerequisites

Step-by-Step Instructions

1. Understand the Patch Series

The patches extend the existing migrate_pages() system call and the internal migrate_vma mechanism. The key innovation is turning single-page copy operations into batch requests sent to hardware DMA engines (like the AMD IOMMU or CXL.mem controllers). This reduces per-page overhead and leverages dedicated copy engines. The series also adds a new MIGRATE_BATCH flag and modifies the kernel's page migration path to aggregate multiple pages before offloading.

Read the cover letter on LKML (link). Focus on the changes to mm/migrate.c, include/linux/migrate.h, and the architecture-specific DMA setup (e.g., arch/x86/kernel/amd_iommu.c).

2. Apply the Patches to Your Kernel

  1. Clone the latest linux-next tree: git clone https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  2. Download the patch series from LKML (usually as mbox file or git pull request). For example, using git am after applying patches via b4 tool.
  3. Apply patches in order: git am *.patch
  4. Configure kernel: Enable CONFIG_MIGRATION (already default) and new CONFIG_MIGRATE_BATCH (optional experimental). Also ensure CONFIG_AMD_IOMMU is enabled for hardware offloading.
  5. Build kernel: make -j$(nproc)
  6. Install modules and new kernel: sudo make modules_install install
  7. Reboot into the new kernel.

3. Verify Feature Availability

After booting, check kernel messages for batch migration support:

dmesg | grep -i "migrate_batch"

You should see something like: "migrate_batch: acceleration enabled via IOMMU". Also examine /sys/kernel/debug/migration if debugfs is mounted. The directory may contain a batch_stats file.

4. Configure Hardware Offloading (Optional)

By default, batch offloading may be disabled. To enable, echo to sysfs:

echo 1 > /sys/module/migrate_batch/parameters/enable_offload

To set batch size (number of pages per batch, default 32):

echo 64 > /sys/module/migrate_batch/parameters/batch_size

Note: Larger batches may reduce overhead but increase latency for synchronous ops. Tune based on workload.

5. Run Benchmark and Compare Performance

  1. Test workload: Write a simple program that allocates memory on node 0, then binds to node 1 and accesses memory repeatedly, triggering page migration.
  2. Compile with gcc -o migrate_test migrate_test.c -lnuma (install libnuma-dev).
  3. Run with and without batch/hardware offloading. Disable offloading by writing 0 to the parameter.
  4. Measure migration time: Use perf stat or add internal timing. Example command:
sudo numactl --cpunodebind=1 --membind=0 timeout 10 ./migrate_test

Collect results and compare. Expected improvement: 2x-5x reduction in migration latency for large data sets.

Common Mistakes

Summary

The AMD batch migration patches represent a significant step toward reducing page migration overhead in Linux. By grouping migrations into large batches and offloading copy operations to hardware DMA engines, the kernel can improve performance for memory-intensive workloads on NUMA and CXL systems. This guide provided an overview, prerequisites, step-by-step instructions for applying and testing the patches, and common pitfalls to avoid. With careful tuning, developers can achieve substantial latency reductions. Keep an eye on LKML for future revisions that might extend support to other architectures or improve batch scheduling.

Explore

A Step-by-Step Guide to Fortifying Your Software Supply Chain AWS Unveils AI Agent Revolution: Quick Desktop App and Four New Connect Solutions Reshape Enterprise Operations 10 Crucial Facts About the Increasingly Competitive NIH Grant Landscape Unexpected Generosity: InXile Lets Gamers Keep Freely Acquired Wasteland Remastered Modernizing Go Code with the Revamped go fix Command