September 27, 2018

BRANE power: Gene network inference with graph optimization

And the IPFEN 2018 Yves Chauvin PhD prize is awarded to Aurélie Pirayre, for IFPEN first thesis on bioinformatics with graph optimization for gene networks. In French, you can now read ‘‘BRANE Power’’ : gènes et algorithmes, une alliance pour la chimie verte.

Aurélie Pirayre, Grégoire Allaire, Didier Houssin, Pierre-Henri Bigeard, Eric Heintzé

PhD manuscript and slides for her thesis:
  • Reconstruction and clustering with graph optimization and priors on gene networks and images (manuscript)
    • Abstract: The discovery of novel gene regulatory processes improves the understanding of cell phenotypic responses to external stimuli for many biological applications, such as medicine, environment or biotechnologies. To this purpose, transcriptomic data are generated and analyzed from DNA microarrays or more recently RNAseq experiments. They consist in genetic expression level sequences obtained for all genes of a studied organism placed in different living conditions. From these data, gene regulation mechanisms can be recovered by revealing topological links encoded in graphs. In regulatory graphs, nodes correspond to genes. A link between two nodes is identified if a regulation relationship exists between the two corresponding genes. Such networks are called Gene Regulatory Networks (GRNs). Their construction as well as their analysis remain challenging despite the large number of available inference methods. In this thesis, we propose to address this network inference problem with recently developed techniques pertaining to graph optimization. Given all the pairwise gene regulation information available, we propose to determine the presence of edges in the final GRN by adopting an energy optimization formulation integrating additional constraints. Either biological (information about gene interactions) or structural (information about node connectivity) a priori have been considered to restrict the space of possible solutions. Different priors lead to different properties of the global cost function, for which various optimization strategies, either discrete and continuous, can be applied. The post-processing network refinements we designed led to computational approaches named BRANE for \Biologically-Related A priori for Network Enhancement". For each of the proposed methods --- BRANE Cut, BRANE Relax and BRANE Clust --- our contributions are threefold: a priori-based formulation, design of the optimization strategy and validation (numerical and/or biological) on benchmark datasets from DREAM4 and DREAM5 challenges showing numerical improvement reaching 20%. In a ramification of this thesis, we slide from graph inference to more generic data processing such as inverse problems. We notably invest in HOGMep, a Bayesian-based approach using a Variational Bayesian Approximation framework for its resolution. This approach allows to jointly perform reconstruction and clustering/segmentation tasks on multi-component data (for instance signals or images). Its performance in a color image deconvolution context demonstrates both quality of reconstruction and segmentation. A preliminary study in a medical data classification context linking genotype and phenotype yields promising results for forthcoming bioinformatics adaptations.
  • Slides (PhD defense on July 3rd, 2017)
are finally online (check the EURASIP Library of Ph.D. Theses). The work was ruled by the concept of BRANE power; a methodology for gene regulatory network inference and clustering based on graph optimization and biological priors. BRANE stands for Biologically Related Apriori Network Enhancement. It rhymes with cell membrane (and brain, for who it's worth).

Gene regulatory network inference with BRANE Cut

State-of-the-art results are obtained on synthetic and real transcriptomic data (DREAM-4, DREAM-5 for DREAM consortium challenges, Escherichia coli dataset). Derived methods are BRANE Cut (with graph cuts), BRANE Relax (with proximal optimization) and BRANE Clust (with graph Laplacian). 

Gene network joint inference and clustering with BRANE Clust



Used concepts include:
  • data science, optimization on graphs: maximal flow, minimum cut, random walker algorithm, variational and Bayes variational formalism, convex relaxation, alternating optimization, combinatorial Dirichlet problem, hard-clustering and soft-clustering
  • biology, biotechnology, bioinformatics: transcription factors (TFs) as regulators and non-transcription factors (TFs) as targets, modular networks, biological priors, in-silico data, second generation bio-fuel production, DREAM4 challenge, DREAM5 challenge
  • use to biofuels and green chemistry production (with fungus Trichoderma reesei)
Supervising team:
PhD Thesis reporters

PhD Thesis Examiners
More links: