April 11, 2017

BRANE Clust: cluster-assisted gene regulatory network inference refinement

The joined Gene Regulatory Network (GRN)  inference and clustering tool BRANE Clust has just been published in BRANE Clust: cluster-assisted gene regulatory network inference refinement in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017 (doi:10.1109/TCBB.2017.2688355).

It is also featured on RNA-seq blog and OMIC tools.

Alternative versions are available as a preprint, on biorxiv, with a page and software and in HAL. Another brick in the BRANE series wall, a series of bioinformatics tools based on graphs and optimization, dedicated to -omics gene expression data for GRN (Gene Regulatory Network) inference.

While traditional Next-generation sequencing (NGS) pipelines often combine motley assumptions (correlation, normalization, clustering, inference), this work is an first step toward gracefully combining network inference and clustering. 

BRANE Clust works as a post-processing tool upon classical network thresholding refinement. From a complete weighted network (obtained from any network inference method) BRANE Clust favors edges both having higher weights (as in standard thresholding) and linking nodes belonging to a same cluster. It  relies on an optimization procedure. It  computes an optimal gene clustering (random walker algorithm) and an optimal edge selection jointly. The introduction of a clustering step in the edge selection process improves gene regulatory network inference. This is demonstrated on both synthetic (five networks of  DREAM4 and network 1 of DREAM5) and real (network 3 of DREAM5) data. These conclusions are drawn after comparing classical thresholding on CLR and GENIE3 networks to our proposed post-processing. Significant improvements in terms of Area Under Precision-Recall curve are obtained. The  predictive power on real data yields promising results: predicted links specific to BRANE Clust reveal plausible biological interpretation. GRN approaches that produce a complete weighted network to prune could benefit from BRANE Clust post-processing.

Escherichia coli network built using BRANE Clust on GENIE3 weights and containing 236 edges. Large dark gray nodes refers to transcription factors (TFs). Inferred edges also reported in the ground truth are colored in black while predictive edges are light gray. Dashed edges correspond to a link inferred by both BRANE Clust and GENIE3 while solid links refer to edges specifically inferred by BRANE Clust.
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells.
Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e. CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at:

March 17, 2017

Co-simulation, state-of-the-art by Claudio Gomes

A few days ago, we had a seminar given by Claudio Gomes (University of Antwerp, Belgium). He recently produced a research report (turned into a paper) on Co-simulation: State of the Art with C. Thule, D. Broman, P. G. Larsen and H. Vangheluwe (arxiv). This work is an impressive body of work on tools enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems. It overviews "co-simulation approaches, research challenges, and research opportunities" and "summarizes, bridges, and enhances future research in  this multidisciplinary area".

The attendant slides  nicely summarize in  a didactic way the main issues pertaining to coupled systems, via the composition of sub-system simulations. They deal with Terminology, Simulation units, Input extrapolation techniques, Orchestration algorithms, Algebraic loops, Convergence and Stability.

It is essential to find new ways of enabling experts in different disciplines to collaborate more efficient in the development of ever more complex systems, under increasing market pressures. One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system. Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings. Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area. We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years. The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.

This opportunity was initiated though a very open discussion around extrapolation for co-simulation in cyber-physical systems in CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems.

February 25, 2017

BARCHAN: Blob Alignment for Robust CHromatographic ANalysis (GCxGC)

In 1987, G. P. Barchan et al.  wrote a paper called: Gas chromatographic method of determining carbon monoxide and dioxideBut barkhan, or barchan, means more outside gas chromatography. It refers to sand dunes, with crescent shapes, modeled by the wind.

BARCHAN: crescent-shaped sand dunes

They inspired our image registration tool for comprehensive 2D chromatography peak alignment (GCxGC or GC2D). It was just published as BARCHAN: Blob Alignment for Robust CHromatographic ANalysis (GCxGC), Journal of Chromatography A, February 2017. For given 2D chromatogram areas of interest, a baseline removal (BEADS) is applied, a peak detection is performed, blobs are detected and registered with a mixed rigid/non-rigid transformation based on the Coherent Point Drift technique.

A pair of GCxGC chromatogram areas of interest.
BARCHAN registration, example.

The preprint is here. The HAL and the arxiv version. The abstract is next:

Abstract: (Comprehensive) Two dimensional gas chromatography (GCxGC) plays a central role into the elucidation of complex samples. The automation of the identification of peak areas is of prime interest to obtain a fast and repeatable analysis of chromatograms. To determine the concentration of compounds or pseudo-compounds, templates of blobs are defined and superimposed on a reference chromatogram. The templates then need to be modified when different chromatograms are recorded. In this study, we present a chromatogram and template alignment method based on peak registration called BARCHAN. Peaks are identified using a robust mathematical morphology tool. The alignment is performed by a probabilistic estimation of a rigid transformation along the first dimension, and a non-rigid transformation in the second dimension, taking into account noise, outliers and missing peaks in a fully automated way. Resulting aligned chromatograms and masks are presented on two datasets. The proposed algorithm proves to be fast and reliable. It significantly reduces the time to results for GCxGC analysis.


• BARCHAN: 2D chromatogram and template alignment based on peak registration.
• The alignment is performed by probabilistic estimation with a Gaussian Mixture Model.
• It combines a rigid and a non-rigid transformation for complex samples analysis.
• The method accounts for noise, outliers and missing peaks in an automated way.
• BARCHAN significantly reduces the time to results for GC×GC analysis.

This work is a follow-up of preceding chromatography papers:
  1. Chromatogram baseline estimation and denoising using sparsity (BEADS)
  2. Comprehensive Two-Dimensional Gas Chromatography for Detailed Characterisation of Petroleum Products
  3. Characterization of middle-distillates by comprehensive two-dimensional gas chromatography (GCxGC): a powerful alternative for performing various standard analysis of middle distillates
  4. Comparison of conventional gas chromatography and comprehensive two-dimensional gas chromatography for the detailed analysis of petrochemical samples