Quick Facts
- Category: Science & Space
- Published: 2026-05-08 05:52:22
- How to Appreciate the Motorola Nexus 6’s Groundbreaking Design and Legacy
- Building in Healthcare: FDA Approval, Fundraising, and Team Motivation – Insights from BioticsAI CEO
- Engaging with STAT: First Opinion Letters on Activism, Perimenopause, and Diversity
- Mastering Targeted History Rewrites with Git 2.54's New `git history` Command
- How the Rust Project Selected Its Google Summer of Code 2026 Projects: A Step-by-Step Guide
Overview
Recent breakthroughs in spatial multi-omics technologies have given scientists the power to map gene and protein activity at single-cell resolution within intact tissues. However, these ultra-high-resolution maps are often generated from different tissue samples, platforms, or experimental batches, leaving them fragmented and incomparable. A new computational method—detailed in a Nature Genetics study—solves this by unifying these fragmented maps into coherent spatial atlases. This tutorial walks you through the entire process, from understanding the method to applying it to your own data, using practical steps and insights.

Prerequisites
Required Skills and Knowledge
- Familiarity with single-cell sequencing data (e.g., scRNA-seq, scATAC-seq)
- Basic programming in Python or R
- Understanding of spatial transcriptomics (e.g., MERFISH, Visium, Slide-seq)
- Experience with statistical analysis (normalization, dimensionality reduction)
Software and Tools
- Python 3.8+ with packages:
numpy,pandas,scanpy,squidpy,anndata - Optional: R with
SeuratorSpatialExperiment - Computational resource: 16 GB RAM minimum (for moderate-sized datasets)
Data Requirements
You will need at least two spatial transcriptomics datasets from the same tissue type (e.g., mouse brain, human lymph node). Each dataset should contain:
- Gene expression matrix (genes × spots/cells)
- Spatial coordinates (x, y) for each spot
- Optional: metadata such as section ID, batch label
Step-by-Step Instructions
Step 1: Data Acquisition and Quality Control
Begin by loading your spatial datasets into AnnData objects (or Seurat). For each dataset, perform basic quality control:
- Filter out spots with low total counts (e.g., < 200 genes) or high mitochondrial content
- Normalize using library size scaling or SCTransform
- Identify highly variable genes for downstream integration
# Python example
import scanpy as sc
adata1 = sc.read('dataset1.h5ad')
sc.pp.filter_cells(adata1, min_genes=200)
sc.pp.normalize_total(adata1, target_sum=1e4)
sc.pp.highly_variable_genes(adata1, n_top_genes=2000)
Step 2: Preliminary Clustering and Annotation
Before integration, cluster each dataset independently to identify cell types or regions. This helps later in aligning spatial patterns.
- Perform PCA on highly variable genes
- Compute neighborhood graph and cluster (e.g., Leiden algorithm)
- Optionally annotate clusters using known markers
Store these cluster labels in adata.obs for reference.
Step 3: Feature Selection for Integration
The unifying method relies on shared features across tissues. Select features (genes or proteins) that are:
- Consistently expressed in both datasets (mean expression > 0.1)
- Spatially variable (using Moran’s I or SPARK-X)
This reduces noise and focuses on spatial patterns.
Step 4: Aligning Coordinate Systems
Fragmented maps often come from different sections or orientations. Use a landmark-based approach or a neural network (like a U-Net) to find a transformation that aligns tissue shapes. For simplicity, you can:
- Manually identify a few corresponding points (e.g., tissue boundaries)
- Apply a similarity transformation (rotation + scaling) using Procrustes analysis
from scipy.spatial import procrustes
# mtx1 and mtx2 are 2D coordinate arrays
mtx1, mtx2, disparity = procrustes(mtx2, mtx1)
Step 5: Integrating Expression Data with Spatial Constraints
This is the core step. Use a graph-based integration that preserves both expression similarity and spatial proximity. The method from the paper leverages a spatial mutual nearest neighbors (MNN) approach. Pseudo-code:
- Build spatial k-nearest neighbor graphs within each dataset (using coordinates)
- Identify MNN pairs across datasets after PCA embedding
- Compute batch-correction vectors only for spatially consistent MNN pairs
# Conceptual (simplified)
from scipy.spatial import cKDTree
from sklearn.neighbors import NearestNeighbors
# Find cross-dataset nearest neighbors in PCA space
pca1 = adata1.obsm['X_pca']
pca2 = adata2.obsm['X_pca']
nn = NearestNeighbors(n_neighbors=5).fit(pca2)
distances, indices = nn.kneighbors(pca1)
# Keep only pairs where spatial distance < threshold
spatial_tree = cKDTree(adata2.obsm['spatial'])
spatial_dists, _ = spatial_tree.query(adata1.obsm['spatial'], k=1)
valid_pairs = spatial_dists.flatten() < 50 # adjust threshold
# Correct batch effect only for valid pairs
Step 6: Visualization and Quality Assessment
After integration, visualize the unified atlas. Common plots:
- Joint UMAP colored by dataset to check mixing
- Spatial scatter plot with integrated clusters
- Expression of key markers to verify spatial patterns
sc.pl.spatial(adata_combined, color=['leiden', 'dataset_id'], spot_size=10)
Evaluate integration success using:
- Silhouette score for batch labels (lower is better)
- Correlation of spatial expression of conserved genes
Common Mistakes
Ignoring Batch Effects Within a Single Dataset
If your data comes from multiple runs, treat each run as a separate map. Failure to correct intra-dataset batch effects will cause misalignment.
Over-Aligning with Too Many Dimensions
Using 50+ PCs for MNN can over-correct and wash out biological variation. Stick to 15–30 PCs depending on dataset complexity.
Not Verifying Spatial Correspondence
MNN pairs must be spatially plausible. Without spatial filtering, you may link cells from opposite sides of the tissue, producing false seamless maps.
Using Different Gene Panels
If technologies measure distinct gene sets (e.g., MERFISH vs. Visium), restrict integration to the intersection and confirm that housekeeping genes are consistent.
Summary
Unifying fragmented cell maps into a single spatial atlas requires careful data handling, alignment, and integration that respects both gene expression and physical location. By following this guide—preprocessing individual datasets, selecting shared spatially variable features, performing coordinate alignment, and applying spatially constrained MNN correction—you can create integrated atlases that reveal how cells organize across different sections or experiments. This approach, rooted in recent Nature Genetics methodology, dramatically accelerates the construction of whole-body spatial maps, enabling deeper insights into complex tissues like the brain and immune system.