October 6, 2018

A Monty Python space Odyssey

After 2001, a space odyssey, the second best movie ever could be Monty Python, the holy grail:
Mønti Pythøn ik den Hølie Gräilen Røtern nik Akten Di Wik Alsø wik Alsø alsø wik Wi nøt trei a høliday in Sweden this yër? See the løveli lakes The wøndërful telephøne system And mäni interesting furry animals

A Pink Floyd odyssey: Jupiter and beyond the infinite


September 27, 2018

BRANE power: Gene network inference with graph optimization

And the IPFEN 2018 Yves Chauvin PhD prize is awarded to Aurélie Pirayre, for IFPEN first thesis on bioinformatics with graph optimization for gene networks. In French, you can now read ‘‘BRANE Power’’ : gènes et algorithmes, une alliance pour la chimie verte.

Aurélie Pirayre, Grégoire Allaire, Didier Houssin, Pierre-Henri Bigeard, Eric Heintzé

PhD manuscript and slides for her thesis:
  • Reconstruction and clustering with graph optimization and priors on gene networks and images (manuscript)
    • Abstract: The discovery of novel gene regulatory processes improves the understanding of cell phenotypic responses to external stimuli for many biological applications, such as medicine, environment or biotechnologies. To this purpose, transcriptomic data are generated and analyzed from DNA microarrays or more recently RNAseq experiments. They consist in genetic expression level sequences obtained for all genes of a studied organism placed in different living conditions. From these data, gene regulation mechanisms can be recovered by revealing topological links encoded in graphs. In regulatory graphs, nodes correspond to genes. A link between two nodes is identified if a regulation relationship exists between the two corresponding genes. Such networks are called Gene Regulatory Networks (GRNs). Their construction as well as their analysis remain challenging despite the large number of available inference methods. In this thesis, we propose to address this network inference problem with recently developed techniques pertaining to graph optimization. Given all the pairwise gene regulation information available, we propose to determine the presence of edges in the final GRN by adopting an energy optimization formulation integrating additional constraints. Either biological (information about gene interactions) or structural (information about node connectivity) a priori have been considered to restrict the space of possible solutions. Different priors lead to different properties of the global cost function, for which various optimization strategies, either discrete and continuous, can be applied. The post-processing network refinements we designed led to computational approaches named BRANE for \Biologically-Related A priori for Network Enhancement". For each of the proposed methods --- BRANE Cut, BRANE Relax and BRANE Clust --- our contributions are threefold: a priori-based formulation, design of the optimization strategy and validation (numerical and/or biological) on benchmark datasets from DREAM4 and DREAM5 challenges showing numerical improvement reaching 20%. In a ramification of this thesis, we slide from graph inference to more generic data processing such as inverse problems. We notably invest in HOGMep, a Bayesian-based approach using a Variational Bayesian Approximation framework for its resolution. This approach allows to jointly perform reconstruction and clustering/segmentation tasks on multi-component data (for instance signals or images). Its performance in a color image deconvolution context demonstrates both quality of reconstruction and segmentation. A preliminary study in a medical data classification context linking genotype and phenotype yields promising results for forthcoming bioinformatics adaptations.
  • Slides (PhD defense on July 3rd, 2017)
are finally online (check the EURASIP Library of Ph.D. Theses). The work was ruled by the concept of BRANE power; a methodology for gene regulatory network inference and clustering based on graph optimization and biological priors. BRANE stands for Biologically Related Apriori Network Enhancement. It rhymes with cell membrane (and brain, for who it's worth).

Gene regulatory network inference with BRANE Cut

State-of-the-art results are obtained on synthetic and real transcriptomic data (DREAM-4, DREAM-5 for DREAM consortium challenges, Escherichia coli dataset). Derived methods are BRANE Cut (with graph cuts), BRANE Relax (with proximal optimization) and BRANE Clust (with graph Laplacian). 

Gene network joint inference and clustering with BRANE Clust



Used concepts include:
  • data science, optimization on graphs: maximal flow, minimum cut, random walker algorithm, variational and Bayes variational formalism, convex relaxation, alternating optimization, combinatorial Dirichlet problem, hard-clustering and soft-clustering
  • biology, biotechnology, bioinformatics: transcription factors (TFs) as regulators and non-transcription factors (TFs) as targets, modular networks, biological priors, in-silico data, second generation bio-fuel production, DREAM4 challenge, DREAM5 challenge
  • use to biofuels and green chemistry production (with fungus Trichoderma reesei)
Supervising team:
PhD Thesis reporters

PhD Thesis Examiners
More links:

September 9, 2018

Kultur Pop 44 : Brain et évolution

[Mise à jour, 29/09/2018, pour Pascale Casanova] Elle anima les mardis littéraires, les jeudis littéraires, l'atelier littéraire. Une vie n'est pas coutume, Kultur Pop ajoute un neuvième morceau, leur générique, à Kultur Pop 44, Brain. Qui était déjà dans Kultur Pop 01, il y a 11 ans déjà, en 2007. Car 3^2-2^3 = 1 est une égalité rare.
"We are using your brain's electrical system as a receiver,We are unable to transmit to your conscious neural interference"
Tandis :
le 44e volume, Brain, des génériques Kultur Pop (France Culture/France Inter, et parfois des intruses), vient (enfin) de paraître,



Au programme : Kultur Pop 2018.44 : Brain
  • France Culture, Interlude nuits : Alain Romans, Quel Temps Fait-il a Paris? (Les vacances de monsieur Hulot)
  • France Culture, Culture protestante : Ensemble Lucidarium, O prebstres, prebstres
  • France Culture,  Science publique : Brian Eno & David Byrne, The Jezebel Spirit
  • France Culture, Condordance des temps : Louis Sclavis Sextet, Charmes
  • France Culture, Interlude nuits : Alexandre Desplat, Camera Obscura (Girl With A Pearl Earring)
  • France Culture, Agora : L'Orchestre de Contrebasses, Sablier
  • France Culture, Grands reportages : Bonobo, Kerala
  • France Culture,  Culture de soi, cultures des autres : Music Ensemble of Benares, Kathak Nritya, part 1 & 2
  • Ghost track. France Culture,  Atelier littéraire, mardis littéraires, jeudis littéraires (Pascale Casanova, 29 septembre 2018) : DJ Shadow, Stem long stem

There is no hope, there's only chaos and evolution (Evereve, Fade to grey, Visage cover)


Et en même temps (c'est la mode), nous célébrons the BRANE Power, et l'évolution : Gene network inference with graph optimization

July 29, 2018

Multiscale representation of hexahedral meshes & compression

Companion pages:
A full-scale geological grid structure is decomposed onto embedded wavelet-like scales while preserving the discontinuities, here geological faults (red), using a morphological 2D wavelet:
Geological grid structures and discontinuities preservation (red painted faults)
Categorical properties like rock types (sandstone, limestone, shale)  can be upscaled according to a dedicated non-linear decomposition called modelet (patent #20170344676: Method of exploitation of hydrocarbons of an underground formation by means of optimized scaling):


Hexahedral mesh categorical property: rock type

Continuous properties (saturation, porosity, permeability, temperature) can be homogenized with a 3D Haar wavelet:

Hexahedral mesh continuous property: porosity

The HexaShrink methodology described above is detailed in the recently submitted paper: 
With huge data acquisition progresses realized in the past decades and acquisition systems now able to produce high resolution point clouds, the digitization of physical terrains becomes increasingly more precise. Such extreme quantities of generated and modeled data greatly impact computational performances on many levels: storage media, memory requirements, transfer capability, and finally simulation interactivity, necessary to exploit this instance of big data. Efficient representations and storage are thus becoming "enabling technologies" in simulation science. We propose HexaShrink, an original decomposition scheme for structured hexahedral volume meshes. The latter are used for instance in biomedical engineering, materials science, or geosciences. HexaShrink provides a comprehensive framework allowing efficient mesh visualization and storage. Its exactly reversible multiresolution decomposition yields a hierarchy of meshes of increasing levels of details, in terms of either geometry, continuous or categorical properties of cells. Starting with an overview of volume meshes compression techniques, our contribution blends coherently different multiresolution wavelet schemes. It results in a global framework preserving discontinuities (faults) across scales, implemented as a fully reversible upscaling. Experimental results are provided on meshes of varying complexity. They emphasize the consistency of the proposed representation, in terms of visualization, attribute downsampling and distribution at different resolutions. Finally, HexaShrink yields gains in storage space when combined to lossless compression techniques.
And there is a patent associated to HexaShrink, Method of exploitation of hydrocarbons of an underground formation by means of optimized scaling:

Method of exploitation of hydrocarbons of an underground formation by means of optimized scaling


July 10, 2018

Bioinformatics & datascience: Internship & PhD on multi-omics data

An PhD position is still available on Graph-based learning from integrated multi-omics and multi-species data (genomic, transcriptomic, epigenetic) between IFP Energies nouvelles and CentraleSupélec/INRIA Saclay. All the information is gathered at this address.

Some information is duplicated below:
Micro-organisms are studied here for their application to bio-based chemistry from renewable sources. Such organisms are driven by their genome expression, with very diverse mechanisms acting at various biological scales, sensitive to external conditions (nutrients, environment). The irruption of novel high-throughput experimental technologies provides complementary omics data and, therefore, a better capability for understanding for the studied biological systems. Innovative analysis methods are required for such highly integrated data. Their handling increasingly require advanced bioinformatics, data science and optimization tools to provide insights into the multi-level regulation mechanisms (Editorial: Multi-omic data integration). The main objective of this subject is to offer an improved understanding of the different regulation levels in the cell (from model organisms to Trichoderma reesei strains). The underlying prediction task requires the normalization and the integration of heterogeneous biological data (genomic, transcriptomic and epigenetic) from different microorganisms. The path chosen is that of graph modelling and network optimization techniques, allowing the combination of different natures of data, with the incorporation of biological a priori (in the line of BRANE Cut and BRANE Clust algorithms). Learning models relating genomic and transcriptomic data to epigenomic traits could be associated to network inference, source separation and clustering techniques to achieve this aim. The methodology would inherit from a wealth of techniques developed over graphs for scattered data, social networks. Attention will also be paid to novel evaluation metrics, as their standardization remains a crucial stake in bioinformatics. A preliminary internship position (summer/fall 2018) is suggested before engaging the PhD program. Information at: http://www.laurent-duval.eu/lcd-2018-intern-phd-epigenetics-omics-graph-processing.html

May 6, 2018

Hungarian Syzygies - Trauma Memorial - Werckmeister harmonies - 2001

I am sorry David, I am afraid I should do that :) This thought gathering stems from a talk with David in Budapest, Hungary, from Shakespeare's Helmet Collective. He stands in the eye of a political storm (recent hungarian elections and Viktor Orban declarations), and proposed (with a collective) a Trauma Memorial in the center of Heroes' Square (Hosök tere) in Budapest. I was lucky enough to witness it directly. It appeared as a black monolith with a video stripe:

Budapest, Heroes Square, Trauma Memorial installation around the right of freedom
Here is a story. But you can skip it directly to the video: A millenniumi emlékmu kiegészítése 100 év hordalékával. As a side note, I was happy to be for the first time in Bulgaria, home of many Bulgarian scientists mostly mathematicians (some known as the Martians), some being prominent in the history of wavelets, like Alfréd Haar or Frigyes Riesz, who were put in a multiscale perspective in a 2D wavelet panorama review paper, details below:

Haar and Riesz with multiscale wavelets

A few weeks ago, I was given the opportunity to watch Werckmeister harmóniák by  (Les Harmonies Werckmeister or Werckmeister harmonies) with a friend. He insisted that we should watch the movie, given the following pitch:
A guy in a small drunkard bar builds a choreography with the local boozers, making them reproduce the planets and satellites' motions of the solar system. In black and white. 
Werckmeister Harmonies: satellites and planets in motion
The movie was stunning, with shades, a whale and rising violence. I could not help but relating it to Stanley Kubrick's 2001: A space Odyssey (as we are celebrating its 50th anniversary) and to Ian Watson's The embedding (foreign languages and the whale). We ended this cinema show with the 1962 dystopian black-and-white short movie La jetée (The Jetty), by Chris Marker, aka Christian Bouche-Villeneuve, which was the inspiration for Terry Gilliam's Twelve monkeys army. A story about global war, time-travel, memories, love and death. It can be viewed at Vimeo: La jetée.

Chris Marker (or his Sans Soleil Hungarian avatars, Sandor and Michel Krasna, born in Kolozsvar, 1932 and Budapest, 1946, respectively) is currently subject to a retrospective at La Cinémathèque in Paris: Chris Marker, les sept vies des cinéastes (3 mai/29 juillet 2018).

Then, I was in Budapest in April 2018, for a too short week-end. As I am an obsessed 2001 fan, reminiscences from 2001 were evident everywhere, either in Budapest's magnificent parliament, the Vasarely museum or the mere streets of Budapest.

2001 space odyssey reminiscence from Budapest
So I went to Hosök tere, a beautiful square with monuments celebrating the Magyar historical background. And right in the center of the square, an installation displayed an intriguing video and sound on one side of a black cube:

Trauma memorial, pixels in a silhouette
This marked a Hungarian syzygy, a connection of seemingly unrelated events. The video displayed the upper half of a dark suit, uttering speeches I could not understand. Of course, Hungarian is known as a special language, in the Uralic-Finno-Ugric family. Funnily, the subtitles were in esoteric ASCII characters. But after a few seconds, it became clear that the sounds were reversed, spoken backwards. Apart from causality issues, I should confess that backwards or forwards, Hungarian remains foreign to me. So hopping inside the cube, one could be welcome by its "kind wards".
Our insatiable stomach
This picture summarizes a state of affairs: a rising tide of nationalism, autocratic power, growing on sedimented ancient trauma and more recent angers and fears (as far as I can understand). The above "Our insatiable stomach" is a timely snapshot with those close sounding of Hungary and Hungry sounds. So here is it, an mere addendum to this black blocky sedimentation of history, cast reverse:  A millenniumi emlékmu kiegészítése 100 év hordalékával. With fun: this  Shakespeare's Helmet Collective work is curated by... Byron (for those who have an eye for finest details)


Links:

January 15, 2018

Theories of Deep Learning, videos and slides

With a little sense of provocations carried by the poster, Stanford university STATS 385 (Fall 2017) proposes a series of talks on the Theories of Deep Learning, with deep learning videos, lecture slides, and a  cheat sheet (stuff that everyone needs to know).  Outside the yellow submarine, Nemo-like sea creatures depict Fei-Fei Li, Yoshua Bengio, Geoffrey Hinton, Yann LeCun on a Deep dream background. So, wrapping up stuff about CNN (convolutional neural networks):

The spectacular recent successes of deep learning are purely empirical. Nevertheless intellectuals always try to explain important developments theoretically. In this literature course we will review recent work of Bruna and Mallat, Mhaskar and Poggio, Papyan and Elad, Bolcskei and co-authors, Baraniuk and co-authors, and others, seeking to build theoretical frameworks deriving deep networks as consequences. After initial background lectures, we will have some of the authors presenting lectures on specific papers. This course meets once weekly.
Videos and slides are gathered at follows.
  1. Theories of Deep Learning, Lecture 01: Deep Learning Challenge. Is There Theory? (Donoho/Monajemi/Papyan) : video, slides
  2. Theories of Deep Learning, Lecture 02: Overview of Deep Learning From a Practical Point of View (Donoho/Monajemi/Papyan) : video, slides
  3. Theories of Deep Learning, Lecture 03: Harmonic Analysis of Deep Convolutional Neural Networks (Helmut Bolcskei) : video, slides
  4. Theories of Deep Learning, Lecture 04: Convnets from First Principles: Generative Models, Dynamic Programming & EM (Ankit Patel) : videoslides
  5. Theories of Deep Learning, Lecture 05: When Can Deep Networks Avoid the Curse of Dimensionality and Other Theoretical Puzzles (Tomaso Poggio) : videoslides
  6. Theories of Deep Learning, Lecture 06: Views of Deep Networks from Reproducing Kernel Hilbert Spaces (Zaid Harchaoui) : videoslides
  7. Theories of Deep Learning, Lecture 07: Understanding and Improving Deep Learning With Random Matrix Theory (Jeffrey Pennington) : videoslides
  8. Theories of Deep Learning, Lecture 08: Topology and Geometry of Half-Rectified Network Optimization (Joan Bruna) : videoslides
  9. Theories of Deep Learning, Lecture 09: What’s Missing from Deep Learning? (Bruno Olshausen) : videoslides
  10. Theories of Deep Learning, Lecture 10: Convolutional Neural Networks in View of Sparse Coding (Vardan Papyan and David Donoho) : videoslides


August 29, 2017

April 11, 2017

BRANE Clust: cluster-assisted gene regulatory network inference refinement

The joined Gene Regulatory Network (GRN)  inference and clustering tool BRANE Clust has just been published in BRANE Clust: cluster-assisted gene regulatory network inference refinement in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017 (doi:10.1109/TCBB.2017.2688355).

It is also featured on RNA-seq blog and OMIC tools.

Alternative versions are available as a preprint, on biorxiv, with a page and software and in HAL. Another brick in the BRANE series wall, a series of bioinformatics tools based on graphs and optimization, dedicated to -omics gene expression data for GRN (Gene Regulatory Network) inference.

While traditional Next-generation sequencing (NGS) pipelines often combine motley assumptions (correlation, normalization, clustering, inference), this work is an first step toward gracefully combining network inference and clustering. 

BRANE Clust works as a post-processing tool upon classical network thresholding refinement. From a complete weighted network (obtained from any network inference method) BRANE Clust favors edges both having higher weights (as in standard thresholding) and linking nodes belonging to a same cluster. It  relies on an optimization procedure. It  computes an optimal gene clustering (random walker algorithm) and an optimal edge selection jointly. The introduction of a clustering step in the edge selection process improves gene regulatory network inference. This is demonstrated on both synthetic (five networks of  DREAM4 and network 1 of DREAM5) and real (network 3 of DREAM5) data. These conclusions are drawn after comparing classical thresholding on CLR and GENIE3 networks to our proposed post-processing. Significant improvements in terms of Area Under Precision-Recall curve are obtained. The  predictive power on real data yields promising results: predicted links specific to BRANE Clust reveal plausible biological interpretation. GRN approaches that produce a complete weighted network to prune could benefit from BRANE Clust post-processing.

Escherichia coli network built using BRANE Clust on GENIE3 weights and containing 236 edges. Large dark gray nodes refers to transcription factors (TFs). Inferred edges also reported in the ground truth are colored in black while predictive edges are light gray. Dashed edges correspond to a link inferred by both BRANE Clust and GENIE3 while solid links refer to edges specifically inferred by BRANE Clust.
Abstract:
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells.
Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e. CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at:
http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html

March 17, 2017

Co-simulation, state-of-the-art by Claudio Gomes

A few days ago, we had a seminar given by Claudio Gomes (University of Antwerp, Belgium). He recently produced a research report (turned into a paper) on Co-simulation: State of the Art with C. Thule, D. Broman, P. G. Larsen and H. Vangheluwe (arxiv). This work is an impressive body of work on tools enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems. It overviews "co-simulation approaches, research challenges, and research opportunities" and "summarizes, bridges, and enhances future research in  this multidisciplinary area".

The attendant slides  nicely summarize in  a didactic way the main issues pertaining to coupled systems, via the composition of sub-system simulations. They deal with Terminology, Simulation units, Input extrapolation techniques, Orchestration algorithms, Algebraic loops, Convergence and Stability.

Abstract:
It is essential to find new ways of enabling experts in different disciplines to collaborate more efficient in the development of ever more complex systems, under increasing market pressures. One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system. Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings. Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area. We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years. The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.


This opportunity was initiated though a very open discussion around extrapolation for co-simulation in cyber-physical systems in CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems.

February 25, 2017

BARCHAN: Blob Alignment for Robust CHromatographic ANalysis (GCxGC)

In 1987, G. P. Barchan et al.  wrote a paper called: Gas chromatographic method of determining carbon monoxide and dioxideBut barkhan, or barchan, means more outside gas chromatography. It refers to sand dunes, with crescent shapes, modeled by the wind.

BARCHAN: crescent-shaped sand dunes

They inspired our image registration tool for comprehensive 2D chromatography peak alignment (GCxGC or GC2D). It was just published as BARCHAN: Blob Alignment for Robust CHromatographic ANalysis (GCxGC), Journal of Chromatography A, February 2017. For given 2D chromatogram areas of interest, a baseline removal (BEADS) is applied, a peak detection is performed, blobs are detected and registered with a mixed rigid/non-rigid transformation based on the Coherent Point Drift technique.

A pair of GCxGC chromatogram areas of interest.
BARCHAN registration, example.

The preprint is here. The HAL and the arxiv version. The abstract is next:

Abstract: (Comprehensive) Two dimensional gas chromatography (GCxGC) plays a central role into the elucidation of complex samples. The automation of the identification of peak areas is of prime interest to obtain a fast and repeatable analysis of chromatograms. To determine the concentration of compounds or pseudo-compounds, templates of blobs are defined and superimposed on a reference chromatogram. The templates then need to be modified when different chromatograms are recorded. In this study, we present a chromatogram and template alignment method based on peak registration called BARCHAN. Peaks are identified using a robust mathematical morphology tool. The alignment is performed by a probabilistic estimation of a rigid transformation along the first dimension, and a non-rigid transformation in the second dimension, taking into account noise, outliers and missing peaks in a fully automated way. Resulting aligned chromatograms and masks are presented on two datasets. The proposed algorithm proves to be fast and reliable. It significantly reduces the time to results for GCxGC analysis.

Highlights

• BARCHAN: 2D chromatogram and template alignment based on peak registration.
• The alignment is performed by probabilistic estimation with a Gaussian Mixture Model.
• It combines a rigid and a non-rigid transformation for complex samples analysis.
• The method accounts for noise, outliers and missing peaks in an automated way.
• BARCHAN significantly reduces the time to results for GC×GC analysis.

This work is a follow-up of preceding chromatography papers:
  1. Chromatogram baseline estimation and denoising using sparsity (BEADS)
  2. Comprehensive Two-Dimensional Gas Chromatography for Detailed Characterisation of Petroleum Products
  3. Characterization of middle-distillates by comprehensive two-dimensional gas chromatography (GCxGC): a powerful alternative for performing various standard analysis of minewddle distillates
  4. Comparison of conventional gas chromatography and comprehensive two-dimensional gas chromatography for the detailed analysis of petrochemical samples
A news-like description on IFP Energies nouvelles is provided:





December 22, 2016

Signal and image classification with invariant descriptors (scattering transforms): Internship

[Internship position closed]

Application and additional details

Description

The field of complex data analysis (data science) is interested in the extraction of suitable indicators used for dimension reduction, data comparison or classification. Initially based on application-dependent, physics-based descriptors or features, novel methods employ more generic and potentially multiscale descriptors, that can be used for machine learning or classification. Examples are to be found in SIFT-like (scale-invariant feature transform) techniques (ORB, SURF), in unsupervised or deep learning. The present internship focuses on the framework of scattering transform (S. Mallat et al.) and the associated classification techniques. It yields signal, image or graph representations with invariance properties relative to data-modifying transformations: translation, rotation, scale… Its performances have been well-studied for classical data (audio signals, image databases, handwritten digit recognition).

This internship aims at dealing with lesser studied data: identification of the closest match to a template image inside a database of underground models, extraction of suitable fingerprints from 1D spectrometric signals from complex chemical compounds for macroscopic chemometric property learning. 

The stake in the first case resides in the different scale and nature of template and model images, the latter being sketch- or cartoon-like versions of the templates. In the second case, signals are composed of a superposition of hundreds of (positive) peaks. Their nature differs from standard information processed by scattering transforms. A focus on one of the proposed applications can be considered, depending on success or difficulties met. 

References

December 17, 2016

Kultur Pop 36 : Rebirth

Le volume 36 (Rebirth) de Kultur Pop, compilations de génériques de Radio France, vient de paraître. Au programme :
Title Artist Track
Zapateado Opus 23 (Pablo de Sarasate) [France Culture, Culture matin] Itzhak Perlman & Sam Sanders 01
Quatre danseries : L' échappée [France Culture, Etat d'alerte (fin)] Jean-Philippe Goude 02
Changanya [France Culture, La matinale du samedi] Lakuta 03
La lune rousse [France Culture, Backstage] Fakear 04
Satta [France Culture, Notre époque] Boozoo Bajou 05
Siegfried [France Culture, Interlude nuits] Erik Truffaz 06
El condor pasa [France Culture, Paso doble, début] Paul Desmond 07
Soleil Rouge [France Culture, Interlude nuits] Jean-Louis Matinier 08

November 18, 2016

Recherche scientifique utilitaire ? Jean Perrin, et la méthode scientifique

Particulièrement ému de prononcer ces paroles dans un coin de ce jardin qu'aimait Marie Curie [...] je veux simplement tirer un enseignement, et vous monter par leur exemple comment toute nouveauté vraiment utile à l'homme ne peut être obtenue que par la découverte des choses inconnues poursuivies sans aucun préoccupation utilitaire. Ce n'est pas en désirant lutter contre le cancer que Marie Curie et Pierre Curie ont fait leurs immortelles découvertes [...]. Ainsi en tout domaine, pour acquérir de la puissance, pour diminuer ces corvées qu'il ne faut pas confondre avec un travail noble, pour faire reculer la vieillesse et la mort elle-même, pour briser enfin le cadre étroit où notre destin semblait à jamais enfermé, nous devons faciliter la recherche scientifique désintéressée. Vous tous qui allez m'écouter par dizaine de milliers, vous qui me voyez sans que je vous voie, entendez mon appel, et contribuez par toute votre influence à faciliter cette recherche conquérante qui fera le bonheur et la liberté des hommes.
Ce texte est lu par Jean Perrin, fondateur du CNRS, en 1938, dans le petit jardin de l'institut du radium. Il se trouve toujours non loin de l'Institut de biologie physico-chimique, rue Pierre et Marie Curie. On le retrouve à 2'30" dans le montage Jean Perrin et la réalité moléculaire à l'occasion du 40e anniversaire de la découverte du radium lors de la semaine internationale contre le cancer.

Cet extrait a été diffusé dans le cadre de l'émission La méthode scientifique, sur France Culture, avec Alain Fuchs : Quelle politique de la recherche en France ?


Son générique est sur Kultur Pop: Leonard Nimoy, Music to watch space girls by


October 31, 2016

CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems

XKCD. My hobby: extrapolating

CHOPtrey. A long, extrapolated and progressive scientific story. Ace, deuce, trey... ready?

It all started with Cyril Faure, then a PhD student  with Nicolas Pernet in real-time computing. He used to stop by my office. We had coffee, discussed books (from Descartes to Sci-Fi) and music (mostly Loudblast, Slayer, Pink Floyd, Manowar, Bolt Thrower). We exchanged ideas and concerns. One day, he told me about a worry in his thesis. Caveat: I am very bad at computer science, advanced programming, and had little hints about partitioned/slacked real-time co-simulation systems.

So this was not about programming, but simulation and co-simulation. Big physical systems (engines, aircrafts, boats) are complicated to simulate. Protocols and methods include FMI standard (Functional Mockup Interface) and FMU (Functional Mockup Unit). Partitioning them into subsystems may help the simulation, but split discrete subsystems should communicate. Fast enough to be precise, slow enough for speed-ups.
Hoher, 2011, A contribution to the real-time simulation...
So when simulated subsystems communicate at regular communication times, and one want interpolated data at a fractional time interval, one generally uses the last known value. This is called zeroth-order hold (ZOH). 
So I wondered: "this sound a little bit like aliasing, in a causal setting, let's interpolate, say, with a line or a parabola". No so easy. In this cosimulation domain, interpolation is "known" to be unstable. Even with FOH (first-order hold) or SOH (second-order hold).

A few implementations (Matlab prototypes, C++ and a final embedding in the XMOD software) and some tuning later, we produced a hierarchical scheme (with Abir, Cyril, Daniel and Mongi), based on morphological contexts, like in lossless image compression (PNG for instance). It proved very cheap, quite robust, and apparently stable. We called it CHOPtrey, from the old French words "ace, deuce, trey" (1, 2, 3 referring to sides of dice). In reference to a chop tray (or cutting board). Because it allows to chop (or cut) a big simulated system into several smaller ones. Because it is composed of three parts or chops:
  • CHOPred: a Computationally Hasty Online Prediction system (improving the trade-off between integration speed-ups, needing large communication steps, and simulation precision)
  • CHOPoly: Causal Hopping Oblivious Polynomials, with smoothed adaptive forward prediction improves co-simulation accuracy. They are similar to Savitzky-Golay filters or LOESS or LOWESS regression methods)
  • CHOPatt: a Contextual & Hierarchical Ontology of Patterns, where data behavior is segmented  into different classes to handle the discontinuities of the exchanged signals
During this work, I learned a lot about co-simulation, parallel computing, multi-core, etc. But also a lot about least-squares parabolic regression, which I thought I knew for decades. I did not. Not the deepest theory ever, but one of my nicest experience of cross-domain collaboration (along with bioinformatics and our BRANE work on gene regulatory networks), some genuine computer engineering, with a final packaged product (implemented in xMod). Here it is:
CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems 
Abstract : The growing complexity of Cyber-Physical Systems (CPS), together with increasingly available parallelism provided by multi-core chips, fosters the parallelization of simulation. Simulation speed-ups are expected from co-simulation and parallelization based on model splitting into weak-coupled sub-models, as for instance in the framework of Functional Mockup Interface (FMI). However, slackened synchronization between sub-models and their associated solvers running in parallel introduces integration errors, which must be kept inside acceptable bounds. CHOPtrey denotes a forecasting framework enhancing the performance of complex system co-simulation, with a trivalent articulation. First, we consider the framework of a Computationally Hasty Online Prediction system (CHOPred). It allows to improve the trade-off between integration speed-ups, needing large communication steps, and simulation precision, needing frequent updates for model inputs. Second, smoothed adaptive forward prediction improves co-simulation accuracy. It is obtained by past-weighted extrapolation based on Causal Hopping Oblivious Polynomials (CHOPoly). And third, signal behavior is segmented to handle the discontinuities of the exchanged signals: the segmentation is performed in a Contextual & Hierarchical Ontology of Patterns (CHOPatt). Implementation strategies and simulation results demonstrate the framework ability to adaptively relax data communication constraints beyond synchronization points which sensibly accelerate simulation. The CHOPtrey framework extends the range of applications of standard Lagrange-type methods, often deemed unstable. The embedding of predictions in lag-dependent smoothing and discontinuity handling demonstrates its practical efficiency.
Keywords: parallel simulation; Functional Mockup Interface (FMI); smoothed online prediction; causal polynomial extrapolation; context-based decision; internal combustion engine

Links:
The full set of numbers for the six sides of a die are ace, deuce, trey, cater, cinque, sice. They are from Old French (cf un, deux, trois, quatre, cinq, six of modern French). Ace is originally from the Latin for 'unit'.

June 12, 2016

Pêcheurs de perles, une parabole (in French, BEADS and CHOPtrey)

This post is about pearls found in apparently simple but yet complex industrial-type questions, and a handful of parabolas. Two practical applications are found in analytical chemistry with  BEADS: Baseline Estimation And Denoising w/ Sparsity and in cyber-physical system co-simulation with  CHOPtrey: Contextual Polynomial extrapolation for real-time forecasting. The whole stuff  is just a parabola, or a parable

I was not at my best level of confidence in this talk, even in French. I had to completely change the talk a couple of days before. Politics... The best dwells in the Dave Gilmour (or Pink Floyd) parts:
The talk, in French, parle de perles trouvées dans des questions à la fois simples et complexes, et d'une poignée de paraboles. Avec deux applications au filtrage de lignes de base en analyse physico-chimique, et pour la co-simulation de systèmes complexes avec des extrapolations polynomiales contextuelles. C'était à Maths en mouvement. The related works are:

 
Pêcheurs de perles, par Laurent Duval from Contact FSMP on Vimeo.

I just needed some parabolic relaxation.
Parabolic relaxation