Contrastive learning for dimensionality reduction and visualization of transcriptomic data

cyvy Research Project

While most successful applications of machine learning to date have been in the realm of supervised learning, unsupervised learning is often seen as a more challenging, and possibly more important problem. Turing award winner Yann LeCun, one of the so-called “Godfathers of AI”, famously compared supervised learning with the thin “icing on the cake” of unsupervised learning. An approach called contrastive learning has recently emerged as a powerful method of unsupervised learning of image data, allowing, for example, to separate photos of cats from photos of dogs without using any labeled data for training. The key idea is that a neural network is trained to keep each image as close as possible to its slightly distorted copy and as far as possible from all other images. The balance between attractive and repulsive forces brings similar images together. In this project these ideas will be applied to single-cell transcriptomics, a very active field of biology where one experiment can measure gene activity of thousands of genes in millions of individual cells. The group will use contrastive learning to find structure in such datasets and to visualize them in two dimensions. They will then go back to the image data and use two-dimensional embeddings as a tool to gain intuition about how different modeling and optimization choices affect the final representation.