A deep learning model to predict RNA-Seq expression of tumours from whole slide images

« The emergence of deep learning is topical in medical research. The use of convolutional neural networks has become commonplace, and their use allows, among other things, to improve the diagnosis and treatment of patients. However, the use of medical images to extract molecular components has not yet been realized. It is in this sense that the HE2RNA model was created. This model has the ability to predict gene expression from Whole Slide Images, thus making it possible to avoid costly traditional sequencing.

HE2RNA is also able to provide virtual spatialization of gene expression. The use of this neural network on TCGA data, including different types of data for different types of cancer, has proven to be robust.

In oncology, to confirm a diagnosis based on patient’s symptoms and define the exact type and stage of a cancer, pathologists perform a histological analysis. The image resulting from a histological analysis is called WSI (Whole Slide Image). Histological analysis has limitations. On the one hand, it has low predictive power. It is, on the other hand, subject to errors. Indeed, two pathologists can reach different conclusions from identical WSIs.

To improve the characterization of a tumor (cancer type, stage) and determine which genes are involved in the development of cancer, pathologists, biostatisticians, and bioinformaticians have developed an RNA-seq technique. This technique, although it has greatly improved the understanding and treatment of cancers, is expensive, time-consuming, and requires specialized equipment and knowledge. It is therefore not available in all hospitals and is rarely used. In addition, gene expression in a cell depends on the type of cell and the stage of its life cycle, which biases the results obtained with RNA-seq.

Machine Learning has the potential to overcome the weaknesses of RNA-seq. The HE2RNA algorithm detailed in this article aims to predict RNA-seq profiles from WSI and to provide virtual spatialization of gene expression.

HE2RNA’s predictions are close to RNA-seq data, giving hope that HE2RNA can help to reduce costs considerably since hospitals are already well equipped with medical imaging equipment ».