December 22, 2016

Signal and image classification with invariant descriptors (scattering transforms): Internship

[Internship position closed]

Application and additional details

Description

The field of complex data analysis (data science) is interested in the extraction of suitable indicators used for dimension reduction, data comparison or classification. Initially based on application-dependent, physics-based descriptors or features, novel methods employ more generic and potentially multiscale descriptors, that can be used for machine learning or classification. Examples are to be found in SIFT-like (scale-invariant feature transform) techniques (ORB, SURF), in unsupervised or deep learning. The present internship focuses on the framework of scattering transform (S. Mallat et al.) and the associated classification techniques. It yields signal, image or graph representations with invariance properties relative to data-modifying transformations: translation, rotation, scale… Its performances have been well-studied for classical data (audio signals, image databases, handwritten digit recognition).

This internship aims at dealing with lesser studied data: identification of the closest match to a template image inside a database of underground models, extraction of suitable fingerprints from 1D spectrometric signals from complex chemical compounds for macroscopic chemometric property learning. 

The stake in the first case resides in the different scale and nature of template and model images, the latter being sketch- or cartoon-like versions of the templates. In the second case, signals are composed of a superposition of hundreds of (positive) peaks. Their nature differs from standard information processed by scattering transforms. A focus on one of the proposed applications can be considered, depending on success or difficulties met. 

References

December 17, 2016

Kultur Pop 36 : Rebirth

Le volume 36 (Rebirth) de Kultur Pop, compilations de génériques de Radio France, vient de paraître. Au programme :
Title Artist Track
Zapateado Opus 23 (Pablo de Sarasate) [France Culture, Culture matin] Itzhak Perlman & Sam Sanders 01
Quatre danseries : L' échappée [France Culture, Etat d'alerte (fin)] Jean-Philippe Goude 02
Changanya [France Culture, La matinale du samedi] Lakuta 03
La lune rousse [France Culture, Backstage] Fakear 04
Satta [France Culture, Notre époque] Boozoo Bajou 05
Siegfried [France Culture, Interlude nuits] Erik Truffaz 06
El condor pasa [France Culture, Paso doble, début] Paul Desmond 07
Soleil Rouge [France Culture, Interlude nuits] Jean-Louis Matinier 08

November 18, 2016

Recherche scientifique utilitaire ? Jean Perrin, et la méthode scientifique

Particulièrement ému de prononcer ces paroles dans un coin de ce jardin qu'aimait Marie Curie [...] je veux simplement tirer un enseignement, et vous monter par leur exemple comment toute nouveauté vraiment utile à l'homme ne peut être obtenue que par la découverte des choses inconnues poursuivies sans aucun préoccupation utilitaire. Ce n'est pas en désirant lutter contre le cancer que Marie Curie et Pierre Curie ont fait leurs immortelles découvertes [...]. Ainsi en tout domaine, pour acquérir de la puissance, pour diminuer ces corvées qu'il ne faut pas confondre avec un travail noble, pour faire reculer la vieillesse et la mort elle-même, pour briser enfin le cadre étroit où notre destin semblait à jamais enfermé, nous devons faciliter la recherche scientifique désintéressée. Vous tous qui allez m'écouter par dizaine de milliers, vous qui me voyez sans que je vous voie, entendez mon appel, et contribuez par toute votre influence à faciliter cette recherche conquérante qui fera le bonheur et la liberté des hommes.
Ce texte est lu par Jean Perrin, fondateur du CNRS, en 1938, dans le petit jardin de l'institut du radium. Il se trouve toujours non loin de l'Institut de biologie physico-chimique, rue Pierre et Marie Curie. On le retrouve à 2'30" dans le montage Jean Perrin et la réalité moléculaire à l'occasion du 40e anniversaire de la découverte du radium lors de la semaine internationale contre le cancer.

Cet extrait a été diffusé dans le cadre de l'émission La méthode scientifique, sur France Culture, avec Alain Fuchs : Quelle politique de la recherche en France ?


Son générique est sur Kultur Pop: Leonard Nimoy, Music to watch space girls by


October 31, 2016

CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems

XKCD. My hobby: extrapolating

CHOPtrey. A long, extrapolated and progressive scientific story. Ace, deuce, trey... ready?

It all started with Cyril Faure, then a PhD student  with Nicolas Pernet in real-time computing. He used to stop by my office. We had coffee, discussed books (from Descartes to Sci-Fi) and music (mostly Loudblast, Slayer, Pink Floyd, Manowar, Bolt Thrower). We exchanged ideas and concerns. One day, he told me about a worry in his thesis. Caveat: I am very bad at computer science, advanced programming, and had little hints about partitioned/slacked real-time co-simulation systems.

So this was not about programming, but simulation and co-simulation. Big physical systems (engines, aircrafts, boats) are complicated to simulate. Protocols and methods include FMI standard (Functional Mockup Interface) and FMU (Functional Mockup Unit). Partitioning them into subsystems may help the simulation, but split discrete subsystems should communicate. Fast enough to be precise, slow enough for speed-ups.
Hoher, 2011, A contribution to the real-time simulation...
So when simulated subsystems communicate at regular communication times, and one want interpolated data at a fractional time interval, one generally uses the last known value. This is called zeroth-order hold (ZOH). 
So I wondered: "this sound a little bit like aliasing, in a causal setting, let's interpolate, say, with a line or parabola". No so easy. In that domain, interpolation is "known" to be unstable. Even with FOH (first-order hold) or SOH (second-order hold).

A few implementations (Matlab prototypes, C++ and a final embedding in the XMOD software) and some tuning later, we produced a hierarchical scheme (with Abir, Cyril, Daniel and Mongi), based on morphological contexts, like in lossless image compression (PNG for instance). It proved very cheap, quite robust, and apparently stable. We called it CHOPtrey, from the old French words "ace, deuce, trey" (1, 2, 3 referring to sides of dice). In reference to a chop tray (or cutting board). Because it allows to chop a big simulated system into several smaller ones. Because it is composed of three parts:
  • CHOPred: a Computationally Hasty Online Prediction system (improving the trade-off between integration speed-ups, needing large communication steps, and simulation precision)
  • CHOPoly: Causal Hopping Oblivious Polynomials, with smoothed adaptive forward prediction improves co-simulation accuracy. They are similar to Savitzky-Golay filters or LOESS or LOWESS regression methods)
  • CHOPatt: a Contextual & Hierarchical Ontology of Patterns, where data behavior is segmented  into different classes to handle the discontinuities of the exchanged signals
During this work, I learned a lot about co-simulation, parallel computing, multi-core, etc. But also a lot about least-squares parabolic regression, which I thought I knew for decades. I did not. Not the deepest theory ever, but one of my nicest experience of cross-domain collaboration (along with bioinformatics and our BRANE work on gene regulatory networks), some genuine computer engineering, with a final packaged product (implemented in xMod). Here it is:
CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems 
Abstract : The growing complexity of Cyber-Physical Systems (CPS), together with increasingly available parallelism provided by multi-core chips, fosters the parallelization of simulation. Simulation speed-ups are expected from co-simulation and parallelization based on model splitting into weak-coupled sub-models, as for instance in the framework of Functional Mockup Interface (FMI). However, slackened synchronization between sub-models and their associated solvers running in parallel introduces integration errors, which must be kept inside acceptable bounds. CHOPtrey denotes a forecasting framework enhancing the performance of complex system co-simulation, with a trivalent articulation. First, we consider the framework of a Computationally Hasty Online Prediction system (CHOPred). It allows to improve the trade-off between integration speed-ups, needing large communication steps, and simulation precision, needing frequent updates for model inputs. Second, smoothed adaptive forward prediction improves co-simulation accuracy. It is obtained by past-weighted extrapolation based on Causal Hopping Oblivious Polynomials (CHOPoly). And third, signal behavior is segmented to handle the discontinuities of the exchanged signals: the segmentation is performed in a Contextual & Hierarchical Ontology of Patterns (CHOPatt). Implementation strategies and simulation results demonstrate the framework ability to adaptively relax data communication constraints beyond synchronization points which sensibly accelerate simulation. The CHOPtrey framework extends the range of applications of standard Lagrange-type methods, often deemed unstable. The embedding of predictions in lag-dependent smoothing and discontinuity handling demonstrates its practical efficiency.
Links:
The full set of numbers for the six sides of a die are ace, deuce, trey, cater, cinque, sice. They are from Old French (cf un, deux, trois, quatre, cinq, six of modern French). Ace is originally from the Latin for 'unit'.

June 12, 2016

Pêcheurs de perles, une parabole (in French, BEADS and CHOPtrey)

This post is about pearls found in apparently simple but yet complex industrial-type questions, and a handful of parabolas. Two practical applications are found in analytical chemistry with  BEADS: Baseline Estimation And Denoising w/ Sparsity and in cyber-physical system co-simulation with  CHOPtrey: Contextual Polynomial extrapolation for real-time forecasting. The whole stuff  is just a parabola, or a parable

I was not at my best level of confidence in this talk, even in French. I had to completely change the talk a couple of days before. Politics... The best dwells in the Dave Gilmour (or Pink Floyd) parts:
The talk, in French, parle de perles trouvées dans des questions à la fois simples et complexes, et d'une poignée de paraboles. Avec deux applications au filtrage de lignes de base en analyse physico-chimique, et pour la co-simulation de systèmes complexes avec des extrapolations polynomiales contextuelles. C'était à Maths en mouvement. The related works are:

Pêcheurs de perles, par Laurent Duval from Contact FSMP on Vimeo.

I just needed some parabolic relaxation.
Parabolic relaxation

April 30, 2016

Trainlets: cropped wavelet decomposition for high-dimensional learning

It's being a lonng time: element 120 from the aperiodic table of wavelets is the trainlet, from Jeremias Sulam, Student Member, Boaz Ophir, Michael Zibulevsky, and Michael Elad, Trainlets: Dictionary Learning in High Dimensions:
Abstract: Sparse representations has shown to be a very powerful model for real world signals, and has enabled the development of applications with notable performance. Combined with the ability to learn a dictionary from signal examples, sparsity-inspired algorithms are often achieving state-of-the-art results in a wide variety of tasks. Yet, these methods have traditionally been restricted to small dimensions mainly due to the computational constraints that the dictionary learning problem entails. In the context of image processing, this implies handling small image patches. In this work we show how to efficiently handle bigger dimensions and go beyond the small patches in sparsity-based signal and image processing methods. We build our approach based on a new cropped wavelet decomposition, which enables a multi-scale analysis with virtually no border effects. We then employ this as the base dictionary within a double sparsity model to enable the training of adaptive dictionaries. To cope with the increase of training data, while at the same time improving the training performance, we present an Online Sparse Dictionary Learning (OSDL) algorithm to train this model effectively, enabling it to handle millions of examples. This work shows that dictionary learning can be up-scaled to tackle a new level of signal dimensions, obtaining large adaptable atoms that we call trainlets.
They offer a base dictionary used within a double sparsity model to enable the training of adaptive dictionaries. The associated package is here, from Michael Elad software page.  The  cropped wavelet decomposition enables a multi-scale analysis with virtually no border effects. An entry  to trainlets has added to WITS, the aperiodic table of wavelets.

But things always ends up with a song! Two of my favorite train songs, by  Porcupine tree (Trains) and the Nits (The train).




April 24, 2016

M-band 2D dual-tree (Hilbert) wavelet multicomponent image denoising

The toolbox implements a parametric nonlinear estimator that generalizes several wavelet shrinkage denoising methods. Dedicated to additive Gaussian noise, it adopts a multivariate statistical approach to take into account both the spatial and the inter-component correlations existing between the different wavelet subbands, using a Stein Unbiased Risk Estimator (SURE) principle, which derives optimal parameters. The wavelet choice is a slightly redundant multi-band geometrical dual-wavelet frame. Experiments on multispectral remote sensing images outperform conventional wavelet denoising techniques (including curvelets). Since they are based on MIMO filter banks (multi-input, multi-ooutput), in a mullti-band  fashion,, we can called they MIMOlets quite safely. The dual-tree wavelet consists in two directional wavelet trees, diisplayed below for a 4-band filter:

4-band directional dual-tree wavelets

The set of wavelet functions implements:
The demonstration script is Init_Demo.m, and the functions for M-band dual-tree wavelets are provided in the directory TOOLBOX_DTMband_solo. For instance, the clean multispectral image (port of Tunis, only one channel):


The (very) noisy version:

The denoised one: