Introduction: Optimal transport (OT) methods have been used to predict cell differentiation trajectories from time-series single-cell data, including scRNA-Seq [2] and spatial transcriptomics [4]. In these methods, an optimization problem is solved to find a mapping between cells at two time points, estimating ancestor-descendant relationships. These maps can be used to study how cells and cell populations change over time. Incorporation of cell image data allows for the inclusion of morphology information in these trajectory mappings. With large amounts of high throughput data, OT can become computationally exhaustive. Many methods utilize linear dimensionality reduction techniques like PCA or ICA. However, utilizing PCA for dimensionality reduction is difficult, as it assumes a linear relationship that may not exist in an image [3]. Autoencoders can be used as an alternative, utilizing unsupervised learning to create a low-dimensional representation of an image, or feature space, that retains meaningful features. Here, we use a neural network autoencoder to learn a latent space representation of images of cells, paired with gene expression data, which can then be used as input to OT methods. As morphological properties of the cell are not dependent on the orientation of the cell, it’s desirable to use a rotationally invariant autoencoder. Our autoencoder allows us to reduce the dimensionality of the image while preserving important features of the image that PCA may miss [1].
Materials and
Methods: Multiplexed error robust fluorescent in situ hybridization (MERFISH) was performed on samples obtained from mouse ovaries stimulated with luteinizing hormone were collected 0 and 4 hours after stimulation. Images of individual cells were obtained from the MERFISH data using standard processing. The O2VAE autoencoder developed by Burgess et al [1] was trained on half of the data from time point 0. After training, the training weights were retrieved to build the model, and all the collected images were run through the trained encoder. Then, the latent representation of each cell were retrieved and concatenated with PCA space of the gene expression profiles of the same cells. Finally, a mapping between cells in the two time point datasets was produced using a modified version of the Sinkhorn Algorithm [5] with the concatenated morphological and gene expression data as inputs.
Results, Conclusions, and Discussions: To quantify and confirm the rotational invariance of the autoencoder, a dataset of 500 cells was created, and copies of each cell rotated by 0 degrees, 45 degrees, 90 degrees, 180 degrees, and 235 degrees were generated. The trained encoder then learned the embedding space of each cell and its copies, creating a so-called “Rotated Training Set”. The embeddings of each cell were compared to the embeddings of the rotated copies and then to the rest of the cells in the Rotated Training Set using Euclidean distance. The Euclidean distances between rotated copies of the same cell were much smaller than the distances between different cells, as seen in Figure 1. The map that was created by the OT algorithm had an average Shannon index of 7185.6484955715405, meaning that on average a cell from time point 0 hours maps to 7185.6484955715405 in time point 4 hours.
Conclusions: We showed that our autoencoder was rotationally invariant, while our OT mappings did not seem effective. The addition of the morphological data did not improve our trajectory estimate but tuning of hyperparameters in the algorithm prior to running OT may lead to stronger mappings. We plan to apply the Waddington OT Method [2] as well to get a better mapping. Another issue may be the loss of resolution in the encoded images. Despite the quantitative success of the autoencoder, the reconstructed images some nuance in cell morphology that may mean there is not enough information in the latent space to provide meaningful context for the OT mapping.
Acknowledgements (Optional): This research was conducted at the Algorithmic Lens on Biology Lab at Boston University under the supervision of Brian Cleary and Peter Bryan.
References: [1] Burgess, J., Nirschl, J.J., Zanellati, MC. et al. Orientation-invariant autoencoders learn robust representations for shape profiling of cells and organelles. Nat Commun 15, 1022 (2024). https://doi.org/10.1038/s41467-024-45362-4 [2] Schiebinger G, Shu J, Tabaka M, Cleary B, Subramanian V, Solomon A, Gould J, Liu S, Lin S, Berube P, Lee L, Chen J, Brumbaugh J, Rigollet P, Hochedlinger K, Jaenisch R, Regev A, Lander ES. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell. 2019 Feb 7;176(4):928-943.e22. doi: 10.1016/j.cell.2019.01.006. Epub 2019 Jan 31. Erratum in: Cell. 2019 Mar 7;176(6):1517. doi: 10.1016/j.cell.2019.02.026. PMID: 30712874; PMCID: PMC6402800. [3] Yang KD, Damodaran K, Venkatachalapathy S, Soylemezoglu AC, Shivashankar GV, et al. (2020) Predicting cell lineages using autoencoders and optimal transport. PLOS Computational Biology 16(4): e1007828. https://doi.org/10.1371/journal.pcbi.1007828 [4] Klein, D., Palla, G., Lange, M., Klein, M., Piran, Z., Gander, M., Meng-Papaxanthos, L., Sterr, M., Bastidas-Ponce, A., Tarquis-Medina, M., Lickert, H., Bakhti, M., Nitzan, M., Cuturi, M., & Theis, F. J. (2023). Mapping cells through time and space with Moscot. BioRxiv. https://doi.org/10.1101/2023.05.11.540374 [5] Cuturi, M. (2013). Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems (pp. 2292–2300).