Ava Khamseh

Stator: High order expression dependencies finely resolve cryptic states and subtypes in single cell data

To attribute disease to cell type, and molecular features of cells to disease state, we need to define and distinguish cell types, sub-types and states. The Human Cell Atlas has taken a step in this direction by seeking to define all human cell types and their molecular features, most often gene expression, within a multidimensional ‘cell space’. Typing of cells is easiest when their lineages are well separated, and hardest when they are distinguished only by state (such as cell cycle phase, level of maturity, or response to stimulus) or spatial location. Approaches commonly used in the literature, including clustering, group cells based on proximity in expression space, thus yielding cell type definitions at relatively low-resolution.

We introduce Stator, a novel method that finely resolves cell types, subtypes and states among cells whose transcriptomes appear homogeneous upon clustering.  Stator takes advantage of lowly-expressed as well as not-expressed genes, and can identify rare biological states (~0.2% of 10k single cells). The approach: (i) utilises structure learning, (ii) applies a model-free estimator of higher-order interactions to quantify expression dependencies amongst n-tuples of genes (beyond pair-wise), (iii) extracts significantly deviating combinatorial gene signatures (tuples) driving these higher-order gene dependencies, and finally (iv) combines tuples into Stator states when they commonly co-occur in the same cell. Typically, Stator labels a cell not just by type and sub-type but also by biological state, for example an immature interneuron in G2/M cell cycle phase. Stator generates molecular and cellular hypotheses for subsequent experimental testing. To facilitate this, we provide the Stator Shiny App (https://shiny.igc.ed.ac.uk/MFIs/). This flexible app takes the output of Stator’s Nextflow pipeline and - through an interactive and user-friendly interface - performs downstream analyses such as differential expression and gene ontology analyses amongst Stator states.