22-22 mai 2025 Marseille (France)

Conférencier·e·s invité·e·s

  • Céline Brochier, Laboratoire de Biométrie et Biologie Evolutive (LBBE), Lyon  
C_Brochier

 

Identification of thermoadaptation fingerprints in three-dimensional protein structures using machine learning and persistent homology

Proteins contain a phylogenetic signal that allows the history of organisms to be traced, but they also contain information about the environmental and genomic constraints to which they are subject. Until recently, most evolutionary studies were based on protein sequence analysis (> 250 million sequences available in Uniprot, compared to 230,000 experimentally resolved structures in the PDB). The recent development of reliable methods for predicting three-dimensional protein structures opens new perspectives (> 250 million structures available). However, this avalanche of data requires new analysis methods. In this context, we have initiated a new project at the intersection of topological data analysis, bioinformatics and molecular biology with the aim to develop: (i) new geometric representations of structures (called biogeometric markers) (1), (ii) new methods based on persistent homology to analyse these representations (2, 3), and new prediction models based on machine learning. These developments will be illustrated using the example of thermoadaptation, an adaptive process that strongly constrains the evolution of proteins, in particular the abundancy of certain amino acids in proteins. As a consequence, strains with different optimal growth temperatures have proteomes with different amino acid compositions (4, 5). However, environmental temperature is not the only factor influencing protein amino acid composition, and factors such as genomic GC content, salinity, and expression level also have an effect. In a recent study, we showed that Methanococcales is an interesting model to study thermoadaptation (6, 7). Indeed, in these archaea, the optimal growth temperature is the main factor influencing amino acid abundance in proteomes, explaining 70% of the observed variance, whereas in other prokaryotic lineages it explains at most 20%. By coupling molecular phylogenetics with ancestral sequence reconstruction, we have unravelled the underlying substitution patterns, revealing lysine as a key player in this process (6). We have also shown that the protein structures do indeed contain a strong signal, which has allowed us to build a molecular thermometer that can predict the optimal growth temperature of strains from individual protein structures with a margin of error of ±5°C and an accuracy greater than 0.90.

 

 

  • Etienne DanchinGenomics & Adaptive Molecular Evolution (GAME) à l'Institut Sophia Agrobiotech, Sophia-Antipolis

E_Danchin

 

End of the beginning: long-read genome assemblies of hybrid parasitic worms reveal unusual chromosome ends

Telomeres are nucleoprotein complexes that cap linear chromosomes and protect them from fusion and degradation. They are involved in cell ageing regulation and their dysfunction can cause serious disease. In the model nematode C. elegans telomeric DNA is made of (TTAGGC)n terminal repeats associated with single-strand as well as double-strand DNA binding proteins, forming a protective terminal complex. Telomeric repeats are added at chromosome ends after DNA replication by a telomerase reverse transcriptase, using an RNA template. This system is assumed to be widely conserved in eukaryotes, including in the phylum Nematoda. Using long-read sequencing, we have assembled the genomes of the three most economically important root-knot nematodes (genus Meloidogyne). Meloidogyne incognitaM. javanica and M. arenaria are devastating plant pests with polyploid (3n – 4n) genomes as a result of complex interspecific hybridizations, which poses challenges for correct separation of the haplotypes. We have assembled the genomes at high contiguity levels, with N50 values of ~2Mb and have mostly unzipped the assembly in A and B sub-genomes. The biggest contigs represent nearly complete chromosomes, allowing investigations of how they start and end. The canonical (TTAGGC)n repeat was not found in any of the Meloidogyne genomes analyzed and no evidence for a telomerase or orthologs of C. elegans telomere-binding proteins could be found. Instead, bioinformatics analyses revealed complex motifs at one end of several contigs. Using DNA FISH experiments, we revealed that these complex motifs were mostly at one end of chromosomes in the three species. These complex repeats are specific to mitotic parthenogenetic root-knot nematodes and return no significant similarity to any other species. Yet, they present several characteristics of bona fidetelomeric repeats, including the ability to form G-quadruplex, their stranded orientation and evidence for transcription. This ensemble of results suggests mitotic parthenogenetic root-knot nematodes possess very specific complex motifs at one end of their chromosomes. Proteins and RNA molecules interacting with these repeats remain to be discovered. These findings open new perspectives towards understanding how genome integrity is maintained in these polyploid mitotic pests of worldwide agricultural importance.

 

 

  • William RitchieInstitut de Génétique Humaine, Montpellier
L_Ritchie

 

Sécurité des données, parcimonie et spécificité. De nouveaux outils d’IA pour lutter contre des problèmes persistants.

Les données biologiques, surtout génomiques coutent cher à produire et sont difficiles à partager à cause de problèmes d’anonymisation. Dans le domaine médical à ce problème s’ajoute celui des données manquantes à cause de désistements ou de plans d’expérimentations qui sont contraints financièrement.  Dans ma présentation je vais montrer comment ces limitations sur les données disponibles peuvent être en partie contournées en utilisant des méthodes d’Intelligence Artificielle et des heuristiques bio-inspirées.

 

Personnes connectées : 2 Vie privée | Accessibilité
Chargement...