17:00 - 18:00
Tue-P
Room: Foyer Conde De Cantanhede
Odor prediction via graph neural networks and representation learning
Poster presentation
Matej Hladiš, Sébastien Fiorucci, Jérôme Golebiowski, Jérémie Topin
Institute of Chemistry, Université Côte d’Azur
Our sense of smell relies on the use of approximatively 400 genes expressing functional odorant receptors (ORs), endowing us with the power to perceive complex chemical space surrounding us. ORs are transmembrane proteins which belong to the family of class A G protein-coupled receptors (GPCR). Establishing a relationship between the structure of a molecule and the smell it triggers is a long-standing challenge. The first step to crack the combinatorial code of olfaction relies on identification of OR-ligand pairs. Nowadays, the data linking a molecule to a set of ORs remain scarce and only 131 ORs have an identified ligand. Thus, building a machine learning protocol taking ORs’ sequence explicitly remains challenging. To tackle this issue, we leverage recent advances in representation learning and combine them with graph neural network (GNN) to build a receptor-ligand interaction prediction model. Several methods inspired by success of representation learning in the natural language processing (NLP) have been proposed to represent protein sequences. Here we use architecture based on BERT model to represent ORs which was previously trained on more than 200M protein sequences. We use the output of BERT as a starting point for receptor representation. We treat ligands as graphs and process ORs and ligands simultaneously using GNN. This receptor-ligand model has been evaluated on a set of more than 7500 OR-ligand pairs. The model is achieving a Matthews correlation coefficient (MCC) of 0.40 in the case that all receptors are included in the training set (i.e. random split). The performance on a much more difficult deorphanization task (i.e. discarding all pairs of a given receptor) remains acceptable with a value of 0.27. As a comparison, an exhaustive in vitro search would lead to a success rate of ~3% and MCC equal to 0