Graduate Student Researcher Johns Hopkins University, Biomedical Engineering Baltimore, Maryland, United States
Introduction: Cells control their identity and behavior through the interaction of transcription factors (TFs) and accessible cis-regulatory elements (CREs) to regulate target genes. This interaction forms a “triplet” of TF-CRE-gene, where a trans-acting TF binds with a CRE and together they change the likelihood of a target gene’s expression. Single-nucleus multi-omics sequencing allows for the simultaneous profiling of RNA and chromatin accessibility from the same cell, providing an exciting opportunity to gain a deeper understanding of cell regulatory mechanisms. Currently, most snMulti-omics analysis methods infer TF-CRE interaction using a motif detection approach, which uses motif databases and algorithms to scan accessible CREs to identify active TF binding sites, and CRE-gene interaction within a distance threshold. However, this approach has many potential downsides. One potential problem is that the motif database is incomplete, which could result in missing important TF regulatory interactions with chromatin. Additionally, the motif-scanning algorithm assumes that CREs are in proximity to the target gene, often limiting the scanning range to correlate CREs with the gene. This assumption has been proved false multiple times, as research has found that CREs can regulate genes from long distances, sometimes as far as megabase pairs away in Friman et al’s work. To resolve these complications and infer more comprehensive interactions, we developed Epoch+, a framework that constructs dynamic gene regulatory networks (GRNs) using the context likelihood of relatedness (CLR) method. Previous work in our lab has shown this method successfully captures gene regulatory information.
Materials and
Methods: This study focuses on elucidating the GRNs that govern cell regulation by leveraging on simultaneously paired single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) data. Epoch+ posits the regulatory relationship between a CRE peak and a putative target gene if the CLR method determines the CRE peak is correlated with both the ATAC signal at the gene body and gene expressions. Additionally, the CRE and the gene are limited to interactions on the same chromosome. Similarly, a regulatory relationship between gene and TF exists either by direct CLR correlation between gene and TF or indirect CLR correlation in which a TF regulates the gene through a CRE. This approach enables the initial reconstruction of TF-CRE-gene GRN. After the initial GRN is constructed, Epoch+ uses linear regression to infer specific activating or repressive regulatory relationships between TF-CRE and CRE-gene pairs. Epoch+ takes this static GRN as a backbone and transforms it into a dynamic GRN by partitioning cells into different time epochs. The epoch assignment can be done using cell trajectory inference or user definition. Then, Epoch+ will prune the regulatory edge if the average TF expression or CRE accessibility is undetectable in the given epoch to generate epoch-specific GRN.
Results, Conclusions, and Discussions: To construct GRNs, we performed multi-seq multiplexed single-nucleus multi-omics (snMulti-omics) on murine bone marrow stromal cells (BMSCs) undergoing osteoblast differentiation at four time points. To minimize batch-to-batch variation between time points, differentiation was initiated such that all time points could be collected and subjected to snMulti-omics simultaneously. The FASTQ files were processed with the 10X Cell Ranger ARC 2.0.2 pipeline to perform alignment, peak calling, and location assignment on chromosomes for both ATAC and GEX fractions. The aligned RNA data were processed with Scanpy to remove potential doublets and low-quality cells, while ATAC data were processed with Muon to filter out cells with suboptimal nucleus signal and fragment counts. Both packages are in Python, and 3858 nuclei were captured after quality control. For features, 16,211 genes were captured, and 1385 important TFs were selected for mice gene regulation. After filtering out low-readout peaks, 184,638 peaks were selected. Among all selected peaks, 33,299 were annotated as genes, and the remaining 151,301 were intergenic CREs. BMSC’s osteo-lineage differentiation is well-studied in gene expression changes. Known osteogenic genes and enhancers that regulate these genes are included as a set of positive controls to ensure Epoch+ can correctly capture the TF-CRE-gene interaction. The BMSC dataset will be used to as a standard to compare Epoch+ ’s performance with other snMulti-omics methods.