Demystifying CITE-seq: A Practical and Comprehensive Guide to the Cellular Indexing of Transcriptomes and Epitopes by Sequencing

The field of single-cell biology has been transformed by techniques that merge protein and transcript information at the level of individual cells. Among these, CITE-seq stands out for its ability to couple surface protein profiling with full transcriptomes through a single sequencing readout. In this guide, we explore the fundamentals of CITE-seq, discuss practical considerations for experimental design, walk through analysis workflows, and highlight how this approach is reshaping insights in immunology, cancer research and beyond. Whether you refer to it as CITE-seq or cite-seq in informal notes, the underlying concept remains the same: a multi-omic, single-cell perspective that integrates data streams to produce richer biological narratives.
What is CITE-seq? An overview of the core idea behind CITE-seq
CITE-seq, or Cellular Indexing of Transcriptomes and Epitopes by sequencing, is a multimodal single-cell technology that combines transcriptome profiling with quantitative measurement of cell-surface proteins. Instead of relying solely on mRNA abundance to infer cell identity and state, CITE-seq adds a layer of direct protein quantification via antibody-derived tags (ADTs) that are sequenced alongside the cellular RNA. This simultaneous readout enables more accurate cell typing, better discrimination of closely related populations, and a richer view of cellular phenotypes.
The term cite-seq (lowercase) often appears in informal dialogue or when discussing older literature. However, in formal contexts and in most current literature, the widely accepted nomenclature is CITE-seq (with a capitalised CITE and an evocative hyphen). Either way, the message is clear: a single cell is not merely a snapshot of its transcriptome but a combined transcript and protein portrait, captured by sequencing.
The components of CITE-seq: ADTs, antibodies and oligonucleotide tags
Antibody-derived tags (ADTs) and oligo-conjugated antibodies
Central to CITE-seq are antibody-derived tags, or ADTs. In practice, antibodies against cell-surface proteins are conjugated to unique, short DNA oligonucleotides. These oligos function as barcodes: upon sequencing, the counts attributed to each tag reflect the abundance of the corresponding surface protein on the cell. Because the ADTs are integrated into the sequencing library, researchers can quantify on a per-cell basis both the RNA and the surface protein landscape.
The tiling of ADTs is deliberate: one antibody per targeted protein, each with a distinct DNA barcode. In total, a CITE-seq panel may include dozens of surface markers, allowing fine-grained resolution of immune cell subsets, activation states, and maturation stages. The use of DNA oligos rather than fluorescent readouts provides a highly scalable, multiplexed approach that is compatible with standard sequencing workflows.
How CITE-seq integrates ADTs with mRNA capture
The sequencing workflow in CITE-seq is designed to co-capture both messenger RNA and ADT-derived DNA fragments. In a typical droplet-based workflow, cells are partitioned with barcoded beads that enable individual cell indexing. The mRNA transcripts are reverse-transcribed and amplified to generate cDNA libraries. In parallel, the ADTs are captured and amplified, producing a separate library of antibody-oligo sequences. Sequencing data then yield two orthogonal data streams per cell: gene expression counts and ADT counts, which can be integrated in downstream analysis.
Designing a CITE-seq experiment: planning and considerations
Defining scientific questions and selecting targets
A successful CITE-seq study begins with clear hypotheses. Are you seeking to refine cell-type annotations, identify activation states, or map protein landscape changes in response to treatment? Your choice of surface markers (the ADT panel) should align with these goals. In immunology, panels often span T cell markers (e.g., CD3, CD4, CD8), activation and exhaustion markers (e.g., PD-1, CTLA-4), and lineage-defining antigens. A well-chosen panel amplifies the power of CITE-seq by enabling nuanced dissection of phenotypes that RNA alone might miss.
Panel design and controls: balancing breadth, specificity and cost
Panel design involves trade-offs between breadth (number of markers) and depth (sequencing capacity and cost). Each additional ADT adds complexity to library preparation and sequencing requirements. It is prudent to include controls such as isotype controls, known-positive and known-negative populations, and spike-ins where feasible. Don’t overlook the importance of antibody validation: non-specific binding or cross-reactivity can distort ADT signals, undermining the reliability of CITE-seq measurements. A pilot panel can help calibrate staining conditions and confirm antigen accessibility on your cell type of interest.
Sample handling and multiplexing strategies
When processing multiple samples, sample multiplexing can reduce batch effects and increase throughput. Techniques such as antibody-based hashing and sample barcodes allow pooling of cells from different donors or conditions. In CITE-seq, HALO or Cell Hashing methods enable robust demultiplexing after sequencing. A thoughtful multiplexing plan can save time and resources while preserving analytic power. Ensure that sample handling steps minimize cell loss and preserve surface epitopes, which can be sensitive to proteolytic digestion and harsh processing.
Sequencing depth and read structure: planning the data yield
For CITE-seq experiments, sequencing depth must cover two logical layers: the transcriptome and the ADTs. A common tuning is to allocate a higher read depth to gene expression libraries and a more modest depth to the ADT libraries, given that a smaller number of reads per ADT tag may be sufficient for robust quantification. Accurate planning requires estimates of anticipated cell numbers, panel size, and desired detection thresholds for low-abundance markers. Consulting sequencing core facilities for platform-specific recommendations can help balance cost with data quality.
From bench to data: the CITE-seq workflow in the lab
Sample preparation and staining
Beginning with a high-quality single-cell suspension is crucial. Cells are stained with a panel of oligo-conjugated antibodies under conditions that preserve surface epitopes and minimise non-specific binding. After staining, cells are washed to reduce background, then loaded into a single-cell capture system (such as a droplet-based platform) for partitioning into individual cells with oligo barcodes that identify their origin. Optional dead cell discrimination helps improve data quality by excluding damaged cells that confound measurements.
Library preparation and sequencing
Post-staining, mRNA capture proceeds through reverse transcription and library preparation, generating cDNA libraries representing transcriptomes. Parallel preparation of the ADT library follows, wherein the antibody-derived DNA tags are amplified and prepared for sequencing. The sequencing run then yields two data streams per cell: RNA-derived reads and ADT-derived reads. The dual-output nature of CITE-seq makes it a powerful approach for integrating phenotypic and transcriptional landscapes.
Quality control: data cleanliness from the outset
Quality control is carried out at several levels. In the transcriptome data, typical QC checks include metrics such as the number of detected genes per cell, the total counts, and the proportion of mitochondrial gene reads. For ADTs, one monitors the distribution of tag counts, the presence of non-specific tags, and potential cross-contamination signals. Low-quality cells may be filtered based on combined RNA and ADT metrics to retain a high-quality dataset for downstream analysis.
Data processing and analysis: turning CITE-seq data into biological insight
Pre-processing: aligning reads and generating count matrices
Initial processing involves demultiplexing reads, aligning RNA reads to reference genomes, and counting gene-level transcripts. In parallel, ADT counts are extracted by mapping sequences to their corresponding antibody barcodes. The resulting data structures include a gene expression matrix (cells by genes) and an ADT count matrix (cells by antibodies). These matrices form the basis for integrated analyses that reveal both transcriptomic and proteomic states of single cells.
Quality control and data filtering
Integrated QC involves identifying doublets, low-quality cells, and potential technical artefacts. Doublet detection helps prevent artificial cell states arising from two cells captured together. Tools specialising in single-cell QC now accommodate multi-omic data, enabling joint assessment of RNA and ADT signals to flag suspect cells more accurately.
Normalization strategies for CITE-seq: stabilising the signal
Normalization is essential to reduce technical variability and highlight biological differences. For RNA data, log-normalisation with a scale factor is common. For ADTs, several approaches exist: simple log-transformation with a pseudocount, centred log-ratio (CLR) transformation, or more sophisticated methods that account for ambient or background signals. CLR is particularly popular for ADT data because it treats each tag as a compositional component, mitigating the influence of varying total counts across cells. It is important to apply appropriate normalisation to both data modalities to enable meaningful integration.
Dimensionality reduction and clustering
After normalisation, techniques such as PCA (principal components analysis) on RNA data and dimensionality reduction like UMAP or t-SNE are used to visualise the data, often in conjunction with the ADT layer. Several studies have shown that including ADT information improves cluster separation and allows the identification of subtle cell states that RNA data alone may obscure. Clustering can be performed on a concatenated feature set or via multi-omic integration frameworks that balance contributions from RNA and ADTs.
Integrating RNA and ADT data: multi-omic approaches
The real strength of CITE-seq lies in integrating the transcriptome with surface proteomics. Approaches to integration range from simple weighting of modalities to advanced probabilistic models. One widely adopted framework is totalVI, a variational inference model designed for single-cell multi-omics data that combines RNA and ADT information within a single statistical framework. Using totalVI can improve cell-type annotation, reveal transcriptional-protein discordance, and enable more accurate lineage and state inferences.
Biological interpretation: from numbers to biology
Interpreting CITE-seq results involves connecting the dots between mRNA and protein levels. It is not uncommon to observe discordance between transcript and protein abundance for certain markers due to post-transcriptional regulation, protein turnover, or rapid signalling events. Visualisation tools—heatmaps, violin plots, and parallel coordinate plots—help highlight markers that distinguish cell subsets or track responses to perturbations. The combination of gene expression with ADT data frequently leads to more robust cell-type annotations and sharper delineation of functional states.
Software tools and pipelines for CITE-seq analysis
A number of software packages have matured to support CITE-seq workflows. Popular choices include Seurat, Scanpy, and dedicated modules within scRNA-seq toolkits. For multi-omic analysis, packages such as Seurat v4 and LiKY-type pipelines can handle integrated RNA and ADT data, including support for CLR normalisation and multi-modal clustering. Tutorials and example datasets are widely available, making it feasible for laboratories to implement end-to-end CITE-seq analyses with community-backed best practices. Documentation often emphasises reproducibility, version control, and the importance of keeping track of panel cartas and antibody lots for traceability.
Applications of CITE-seq: what this technology enables in practice
Immunology: dissection of immune cell landscapes
CITE-seq has become a staple in immunology, enabling precise identification of immune cell subsets in blood, tissues and tumours. Researchers routinely use CITE-seq to distinguish T cell subsets—naïve, central memory, effector memory—and to profile activation or exhaustion states with surface markers that saga RNA data alone cannot fully resolve. The ability to quantify surface proteins clarifies lineage relationships, reveals heterogeneity within phenotypically similar populations, and supports discoveries in vaccine responses and immune monitoring.
Oncology: characterising tumour microenvironments
In cancer research, the tumour microenvironment is a mosaic of malignant cells, immune infiltrates and stromal components. CITE-seq enables simultaneous profiling of tumour cells and the surrounding immune context, providing insights into infiltration, immune evasion, and the functional status of cytotoxic cells. By mapping ADT signals to gene expression profiles, researchers can identify resistance mechanisms, track antigen presentation dynamics, and discover markers that predict response to therapies.
Infectious disease and beyond
Beyond immunology and oncology, CITE-seq supports studies in infectious disease, transplantation biology, and developmental biology. The method is adaptable to various tissue types and model systems. In infectious disease, for example, CITE-seq can reveal how pathogens alter surface phenotypes of host cells, while in developmental studies it can capture transient expression programmes that unfold during differentiation processes.
Comparing CITE-seq with related technologies
REAP-seq and other parallel approaches
REAP-seq (RNA Expression And Protein sequencing) emerged as an early alternative that similarly integrates RNA and protein information via antibody-derived tags. While conceptually aligned with CITE-seq, practical differences in library preparation, antibody conjugation chemistry and resolution can influence performance. When deciding between CITE-seq and REAP-seq, researchers consider panel design, compatibility with their sequencing platform, and the maturity of available analysis tools. In many laboratories, CITE-seq has become the more widely adopted framework due to its established workflows and robust community resources.
Other multi-omics and multiplexing strategies
In addition to ADT-based approaches, researchers leverage other multi-omics modalities such as ATAC-seq (chromatin accessibility) at the single-cell level, enabling a broader view of regulatory landscapes alongside transcriptomes. Multiplexed antibody-based approaches, and newer technologies under the umbrella of multi-omics, share the aim of integrating complementary data streams to enhance inference. When selecting a strategy, consider goals such as whether you prioritise surface proteomics, chromatin state, or cytokine signatures, and align the methodology accordingly.
Challenges, limitations and best practices for CITE-seq
Technical limitations and potential artefacts
As with any high-throughput technology, CITE-seq presents challenges. Background signal, cross-reactivity, and non-specific binding can bias ADT measurements. Ambient antibody-derived tags—oligos that are present in the solution rather than bound to a cell—can inflate counts if not properly controlled. Batch effects across antibody lots and staining conditions can also confound interpretation. Thoughtful experimental design, thorough controls, and robust normalisation strategies are essential to mitigate these risks.
Normalization caveats and data integration pitfalls
Normalisation is not one-size-fits-all. The choice between log-normalisation, CLR, or alternative methods depends on your data characteristics and analysis goals. Blending ADT and RNA data for joint clustering requires careful weighting; overemphasising one modality can mask meaningful signals from the other. It is prudent to perform sensitivity analyses to understand how different normalisation schemes influence downstream results and to validate key findings with orthogonal assays when possible.
Practical tips for robust CITE-seq experiments
- Pilot studies: test a smaller panel to calibrate staining conditions and optimise antibody concentrations.
- Document everything: antibody lot numbers, fixation conditions, and sequencing depth should be recorded for reproducibility.
- Include appropriate controls: known-positive and known-negative samples help set interpretation thresholds for ADT signals.
- Plan sequencing thoughtfully: allocate adequate reads to both RNA and ADT libraries, considering panel size and cell count.
- Quality filters: remove cells with suspicious ADT patterns or inconsistent RNA-ADT concordance, to preserve data integrity.
- Validation: corroborate marker-driven findings with independent methods, such as flow cytometry, where feasible.
Future directions for CITE-seq and multi-omics
The landscape around CITE-seq continues to evolve. Advances in antibody panels, improved conjugation chemistries, and higher-throughput sequencing platforms will expand the scope and accuracy of ADT measurements. On the analysis front, more sophisticated integration models, better handling of batch effects, and enhanced visualisation tools will empower researchers to extract deeper biological meaning from the joint RNA-ADT space. As single-cell multi-omics matures, CITE-seq is poised to play a central role in precision immunology, translational research and systems biology, enabling more nuanced characterisation of cell states and dynamic responses to perturbations.
Case studies: real-world examples of CITE-seq in action
Case study 1: delineating T cell subsets in a vaccine study
In a recent vaccine-related investigation, researchers employed CITE-seq to map T cell heterogeneity in peripheral blood before and after vaccination. By integrating surface marker data with transcriptomes, they could distinguish a spectrum of CD8+ T cell states and track clonal expansions with higher confidence than RNA alone would allow. The ADT panel highlighted protein-level changes in cytotoxic markers that aligned with functional assays, providing a robust, multimodal perspective on vaccine-induced immunity.
Case study 2: characterising immune infiltrates in solid tumours
A pancreatic cancer study utilised CITE-seq to profile tumour-infiltrating lymphocytes alongside malignant cells. The combination of surface proteins and gene expression revealed immune suppressive phenotypes that correlated with patient outcomes. The approach helped identify markers for potential therapeutic targeting and provided a richer map of the tumour microenvironment than would be possible with RNA data in isolation.
Best practices: building a reliable CITE-seq workflow in your own lab
Documentation, standards and reproducibility
Maintaining rigorous documentation is essential in CITE-seq projects. Record panel designs, antibody clones, conjugation chemistry, and lot identifiers. Keep a clear record of sequencing runs, sample metadata and analysis pipelines. Reproducibility is the cornerstone of credible single-cell multi-omics work, so adopting standard operating procedures and version-controlled analysis scripts will pay dividends in the long term.
Validation and cross-checks
Validate key findings with orthogonal methods such as flow cytometry, immunofluorescence, or mass cytometry when appropriate. Cross-validation helps confirm that observed ADT signals reflect true protein abundance and surface expression, reinforcing conclusions drawn from the integrated CITE-seq data.
Ethical and regulatory considerations
When working with human samples, adhere to ethical guidelines, obtain appropriate approvals, and manage data privacy carefully. Transparent reporting of methods and data access levels contributes to the responsible advancement of multi-omic single-cell technologies.
Conclusion: the value proposition of CITE-seq in modern biology
CITE-seq represents a powerful convergence of transcriptomics and proteomics at single-cell resolution. By capturing both gene expression and surface protein abundance in the same cell, researchers gain a more accurate, multidimensional view of cellular identity and function. The technique has matured into a versatile framework applicable to immunology, oncology, infectious disease and beyond. While challenges remain—such as antibody panel design, background signals and data integration—the field continues to innovate rapidly, steadily expanding the reach and impact of CITE-seq. For teams seeking to illuminate complex biological systems with high-resolution, multi-omic insights, CITE-seq offers a compelling and increasingly accessible path forward for discovery. Embracing both CITE-seq and its informatic reflexes—robust normalisation, thoughtful integration, and rigorous validation—will help researchers unlock deeper understanding of how cells orchestrate health and disease. In short, CITE-seq is not merely a method; it is a gateway to richer, more actionable biology.