Loading Events

« All Events

Institut Jacques Monod Seminar – Pablo Meyer Rojas

20 March 2026 - 11 h 45 min - 13 h 00 min

Invited by the Minc Lab, Pablo Meyer Rojas, Manager of Biological Analytics and Modeling group and Senior Research Scientist IBM Research USA, will present an Institut Jacques Monod seminar on the theme:

Pre-training DNA language models to make them understand Biology

Abstract:

Large language models (LLMs) trained on text demonstrated remarkable results on natural language processing (NLP) tasks. These models have been adapted to decipher the language of DNA, where sequences of nucleotides act as “words” that encode genomic functions. However, the human genome differs fundamentally from natural language, as it lacks clearly defined words or a consistent grammar. DNA language models (DNALMs) such as EVO or DNABERT, have achieved a high level of performance on genome-related biological tasks using DNA sequences as input.  Deep learning (DL) models trained for specific tasks such as Alpha-Genome or the ones developed through the DREAM crowdsourcing challenges often surpass DNALMs. However, neither DNALMs nor DL models have explanations for how they encode the biological functions they are good at predicting. To address this problem, we pre-train foundation models that effectively integrate sequence variations, in particular Single Nucleotide Polymorphisms (SNPs), as they underlie important biological functions and disease. We show that integrating this and other biological knowledge into the pre-training of DNALMs brings better and interpretable performance of the models.

Details

  • Date: 20 March 2026
  • Time:
    11 h 45 min - 13 h 00 min

Venue

  • Institut Jacques Monod Salle François Jacob
  • 15 rue Hélène Brion
    Paris, 75013 France
    + Google Map