September 9, 2018

Kultur Pop 44 : Brain et évolution

Tandis :
  • qu'un monsieur Hulot est en vacance, et l'on ne sait si c'est chouette,
  • que la glace fond, et pas que dans un vert de rosé,
  • que l'on attend un ANPéRo de rentrée pour réchauffer tout ça :
le 44e volume, Brain, des génériques Kultur Pop (France Culture/France Inter, et parfois des intruses), vient (enfin) de paraître,



Au programme : Kultur Pop 2018.44 : Brain
  • France Culture, Interlude nuits : Alain Romans, Quel Temps Fait-il a Paris? (Les vacances de monsieur Hulot)
  • France Culture, Culture protestante : Ensemble Lucidarium, O prebstres, prebstres
  • France Culture,  Science publique : Brian Eno & David Byrne, The Jezebel Spirit
  • France Culture, Condordance des temps : Louis Sclavis Sextet, Charmes
  • France Culture, Interlude nuits : Alexandre Desplat, Camera Obscura (Girl With A Pearl Earring)
  • France Culture, Agora : L'Orchestre de Contrebasses, Sablier
  • France Culture, Grands reportages : Bonobo, Kerala
  • France Culture,  Culture de soi, cultures des autres : Music Ensemble of Benares, Kathak Nritya, part 1 & 2

There is no hope, there's only chaos and evolution (Evereve, Fade to grey, Visage cover)

July 29, 2018

Multiscale representation of hexahedral meshes & compression

Companion pages:
A full-scale geological grid structure is decomposed onto embedded wavelet-like scales while preserving the discontinuities, here geological faults (red), using a morphological 2D wavelet:
Geological grid structures and discontinuities preservation (red painted faults)
Categorical properties like rock types (sandstone, limestone, shale)  can be upscaled according to a dedicated non-linear decomposition called modelet (patent #20170344676: Method of exploitation of hydrocarbons of an underground formation by means of optimized scaling):


Hexahedral mesh categorical property: rock type

Continuous properties (saturation, porosity, permeability, temperature) can be homogenized with a 3D Haar wavelet:

Hexahedral mesh continuous property: porosity

The HexaShrink methodology described above is detailed in the recently submitted paper: 
With huge data acquisition progresses realized in the past decades and acquisition systems now able to produce high resolution point clouds, the digitization of physical terrains becomes increasingly more precise. Such extreme quantities of generated and modeled data greatly impact computational performances on many levels: storage media, memory requirements, transfer capability, and finally simulation interactivity, necessary to exploit this instance of big data. Efficient representations and storage are thus becoming "enabling technologies" in simulation science. We propose HexaShrink, an original decomposition scheme for structured hexahedral volume meshes. The latter are used for instance in biomedical engineering, materials science, or geosciences. HexaShrink provides a comprehensive framework allowing efficient mesh visualization and storage. Its exactly reversible multiresolution decomposition yields a hierarchy of meshes of increasing levels of details, in terms of either geometry, continuous or categorical properties of cells. Starting with an overview of volume meshes compression techniques, our contribution blends coherently different multiresolution wavelet schemes. It results in a global framework preserving discontinuities (faults) across scales, implemented as a fully reversible upscaling. Experimental results are provided on meshes of varying complexity. They emphasize the consistency of the proposed representation, in terms of visualization, attribute downsampling and distribution at different resolutions. Finally, HexaShrink yields gains in storage space when combined to lossless compression techniques.
And there is a patent associated to HexaShrink, Method of exploitation of hydrocarbons of an underground formation by means of optimized scaling:

Method of exploitation of hydrocarbons of an underground formation by means of optimized scaling


July 10, 2018

Bioinformatics & datascience: Internship & PhD on multi-omics data

An PhD position is still available on Graph-based learning from integrated multi-omics and multi-species data (genomic, transcriptomic, epigenetic) between IFP Energies nouvelles and CentraleSupélec/INRIA Saclay. All the information is gathered at this address.

Some information is duplicated below:
Micro-organisms are studied here for their application to bio-based chemistry from renewable sources. Such organisms are driven by their genome expression, with very diverse mechanisms acting at various biological scales, sensitive to external conditions (nutrients, environment). The irruption of novel high-throughput experimental technologies provides complementary omics data and, therefore, a better capability for understanding for the studied biological systems. Innovative analysis methods are required for such highly integrated data. Their handling increasingly require advanced bioinformatics, data science and optimization tools to provide insights into the multi-level regulation mechanisms (Editorial: Multi-omic data integration). The main objective of this subject is to offer an improved understanding of the different regulation levels in the cell (from model organisms to Trichoderma reesei strains). The underlying prediction task requires the normalization and the integration of heterogeneous biological data (genomic, transcriptomic and epigenetic) from different microorganisms. The path chosen is that of graph modelling and network optimization techniques, allowing the combination of different natures of data, with the incorporation of biological a priori (in the line of BRANE Cut and BRANE Clust algorithms). Learning models relating genomic and transcriptomic data to epigenomic traits could be associated to network inference, source separation and clustering techniques to achieve this aim. The methodology would inherit from a wealth of techniques developed over graphs for scattered data, social networks. Attention will also be paid to novel evaluation metrics, as their standardization remains a crucial stake in bioinformatics. A preliminary internship position (summer/fall 2018) is suggested before engaging the PhD program. Information at: http://www.laurent-duval.eu/lcd-2018-intern-phd-epigenetics-omics-graph-processing.html

May 6, 2018

Hungarian Syzygies - Trauma Memorial - Werckmeister harmonies - 2001

I am sorry David, I am afraid I should do that :) This thought gathering stems from a talk with David in Budapest, Hungary, from Shakespeare's Helmet Collective. He stands in the eye of a political storm (recent hungarian elections and Viktor Orban declarations), and proposed (with a collective) a Trauma Memorial in the center of Heroes' Square (Hosök tere) in Budapest. I was lucky enough to witness it directly. It appeared as a black monolith with a video stripe:

Budapest, Heroes Square, Trauma Memorial installation around the right of freedom
Here is a story. But you can skip it directly to the video: A millenniumi emlékmu kiegészítése 100 év hordalékával. As a side note, I was happy to be for the first time in Bulgaria, home of many Bulgarian scientists mostly mathematicians (some known as the Martians), some being prominent in the history of wavelets, like Alfréd Haar or Frigyes Riesz, who were put in a multiscale perspective in a 2D wavelet panorama review paper, details below:

Haar and Riesz with multiscale wavelets

A few weeks ago, I was given the opportunity to watch Werckmeister harmóniák by  (Les Harmonies Werckmeister or Werckmeister harmonies) with a friend. He insisted that we should watch the movie, given the following pitch:
A guy in a small drunkard bar builds a choreography with the local boozers, making them reproduce the planets and satellites' motions of the solar system. In black and white. 
Werckmeister Harmonies: satellites and planets in motion
The movie was stunning, with shades, a whale and rising violence. I could not help but relating it to Stanley Kubrick's 2001: A space Odyssey (as we are celebrating its 50th anniversary) and to Ian Watson's The embedding (foreign languages and the whale). We ended this cinema show with the 1962 dystopian black-and-white short movie La jetée (The Jetty), by Chris Marker, aka Christian Bouche-Villeneuve, which was the inspiration for Terry Gilliam's Twelve monkeys army. A story about global war, time-travel, memories, love and death. It can be viewed at Vimeo: La jetée.

Chris Marker (or his Sans Soleil Hungarian avatars, Sandor and Michel Krasna, born in Kolozsvar, 1932 and Budapest, 1946, respectively) is currently subject to a retrospective at La Cinémathèque in Paris: Chris Marker, les sept vies des cinéastes (3 mai/29 juillet 2018).

Then, I was in Budapest in April 2018, for a too short week-end. As I am an obsessed 2001 fan, reminiscences from 2001 were evident everywhere, either in Budapest's magnificent parliament, the Vasarely museum or the mere streets of Budapest.

2001 space odyssey reminiscence from Budapest
So I went to Hosök tere, a beautiful square with monuments celebrating the Magyar historical background. And right in the center of the square, an installation displayed an intriguing video and sound on one side of a black cube:

Trauma memorial, pixels in a silhouette
This marked a Hungarian syzygy, a connection of seemingly unrelated events. The video displayed the upper half of a dark suit, uttering speeches I could not understand. Of course, Hungarian is known as a special language, in the Uralic-Finno-Ugric family. Funnily, the subtitles were in esoteric ASCII characters. But after a few seconds, it became clear that the sounds were reversed, spoken backwards. Apart from causality issues, I should confess that backwards or forwards, Hungarian remains foreign to me. So hopping inside the cube, one could be welcome by its "kind wards".
Our insatiable stomach
This picture summarizes a state of affairs: a rising tide of nationalism, autocratic power, growing on sedimented ancient trauma and more recent angers and fears (as far as I can understand). The above "Our insatiable stomach" is a timely snapshot with those close sounding of Hungary and Hungry sounds. So here is it, an mere addendum to this black blocky sedimentation of history, cast reverse:  A millenniumi emlékmu kiegészítése 100 év hordalékával. With fun: this  Shakespeare's Helmet Collective work is curated by... Byron (for those who have an eye for finest details)


Links:

January 15, 2018

Theories of Deep Learning, videos and slides

With a little sense of provocations carried by the poster, Stanford university STATS 385 (Fall 2017) proposes a series of talks on the Theories of Deep Learning, with deep learning videos, lecture slides, and a  cheat sheet (stuff that everyone needs to know).  Outside the yellow submarine, Nemo-like sea creatures depict Fei-Fei Li, Yoshua Bengio, Geoffrey Hinton, Yann LeCun on a Deep dream background. So, wrapping up stuff about CNN (convolutional neural networks):

The spectacular recent successes of deep learning are purely empirical. Nevertheless intellectuals always try to explain important developments theoretically. In this literature course we will review recent work of Bruna and Mallat, Mhaskar and Poggio, Papyan and Elad, Bolcskei and co-authors, Baraniuk and co-authors, and others, seeking to build theoretical frameworks deriving deep networks as consequences. After initial background lectures, we will have some of the authors presenting lectures on specific papers. This course meets once weekly.
Videos and slides are gathered at follows.
  1. Theories of Deep Learning, Lecture 01: Deep Learning Challenge. Is There Theory? (Donoho/Monajemi/Papyan) : video, slides
  2. Theories of Deep Learning, Lecture 02: Overview of Deep Learning From a Practical Point of View (Donoho/Monajemi/Papyan) : video, slides
  3. Theories of Deep Learning, Lecture 03: Harmonic Analysis of Deep Convolutional Neural Networks (Helmut Bolcskei) : video, slides
  4. Theories of Deep Learning, Lecture 04: Convnets from First Principles: Generative Models, Dynamic Programming & EM (Ankit Patel) : videoslides
  5. Theories of Deep Learning, Lecture 05: When Can Deep Networks Avoid the Curse of Dimensionality and Other Theoretical Puzzles (Tomaso Poggio) : videoslides
  6. Theories of Deep Learning, Lecture 06: Views of Deep Networks from Reproducing Kernel Hilbert Spaces (Zaid Harchaoui) : videoslides
  7. Theories of Deep Learning, Lecture 07: Understanding and Improving Deep Learning With Random Matrix Theory (Jeffrey Pennington) : videoslides
  8. Theories of Deep Learning, Lecture 08: Topology and Geometry of Half-Rectified Network Optimization (Joan Bruna) : videoslides
  9. Theories of Deep Learning, Lecture 09: What’s Missing from Deep Learning? (Bruno Olshausen) : videoslides
  10. Theories of Deep Learning, Lecture 10: Convolutional Neural Networks in View of Sparse Coding (Vardan Papyan and David Donoho) : videoslides


August 29, 2017

April 11, 2017

BRANE Clust: cluster-assisted gene regulatory network inference refinement

The joined Gene Regulatory Network (GRN)  inference and clustering tool BRANE Clust has just been published in BRANE Clust: cluster-assisted gene regulatory network inference refinement in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017 (doi:10.1109/TCBB.2017.2688355).

It is also featured on RNA-seq blog and OMIC tools.

Alternative versions are available as a preprint, on biorxiv, with a page and software and in HAL. Another brick in the BRANE series wall, a series of bioinformatics tools based on graphs and optimization, dedicated to -omics gene expression data for GRN (Gene Regulatory Network) inference.

While traditional Next-generation sequencing (NGS) pipelines often combine motley assumptions (correlation, normalization, clustering, inference), this work is an first step toward gracefully combining network inference and clustering. 

BRANE Clust works as a post-processing tool upon classical network thresholding refinement. From a complete weighted network (obtained from any network inference method) BRANE Clust favors edges both having higher weights (as in standard thresholding) and linking nodes belonging to a same cluster. It  relies on an optimization procedure. It  computes an optimal gene clustering (random walker algorithm) and an optimal edge selection jointly. The introduction of a clustering step in the edge selection process improves gene regulatory network inference. This is demonstrated on both synthetic (five networks of  DREAM4 and network 1 of DREAM5) and real (network 3 of DREAM5) data. These conclusions are drawn after comparing classical thresholding on CLR and GENIE3 networks to our proposed post-processing. Significant improvements in terms of Area Under Precision-Recall curve are obtained. The  predictive power on real data yields promising results: predicted links specific to BRANE Clust reveal plausible biological interpretation. GRN approaches that produce a complete weighted network to prune could benefit from BRANE Clust post-processing.

Escherichia coli network built using BRANE Clust on GENIE3 weights and containing 236 edges. Large dark gray nodes refers to transcription factors (TFs). Inferred edges also reported in the ground truth are colored in black while predictive edges are light gray. Dashed edges correspond to a link inferred by both BRANE Clust and GENIE3 while solid links refer to edges specifically inferred by BRANE Clust.
Abstract:
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells.
Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e. CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at:
http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html

March 17, 2017

Co-simulation, state-of-the-art by Claudio Gomes

A few days ago, we had a seminar given by Claudio Gomes (University of Antwerp, Belgium). He recently produced a research report (turned into a paper) on Co-simulation: State of the Art with C. Thule, D. Broman, P. G. Larsen and H. Vangheluwe (arxiv). This work is an impressive body of work on tools enabling experts in different disciplines to collaborate more efficiently in the development of ever more complex systems. It overviews "co-simulation approaches, research challenges, and research opportunities" and "summarizes, bridges, and enhances future research in  this multidisciplinary area".

The attendant slides  nicely summarize in  a didactic way the main issues pertaining to coupled systems, via the composition of sub-system simulations. They deal with Terminology, Simulation units, Input extrapolation techniques, Orchestration algorithms, Algebraic loops, Convergence and Stability.

Abstract:
It is essential to find new ways of enabling experts in different disciplines to collaborate more efficient in the development of ever more complex systems, under increasing market pressures. One possible solution for this challenge is to use a heterogeneous model-based approach where different teams can produce their conventional models and carry out their usual mono-disciplinary analysis, but in addition, the different models can be coupled for simulation (co-simulation), allowing the study of the global behavior of the system. Due to its potential, co-simulation is being studied in many different disciplines but with limited sharing of findings. Our aim with this work is to summarize, bridge, and enhance future research in this multidisciplinary area. We provide an overview of co-simulation approaches, research challenges, and research opportunities, together with a detailed taxonomy with different aspects of the state of the art of co-simulation and classification for the past five years. The main research needs identified are: finding generic approaches for modular, stable and accurate coupling of simulation units; and expressing the adaptations required to ensure that the coupling is correct.


This opportunity was initiated though a very open discussion around extrapolation for co-simulation in cyber-physical systems in CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems.

February 25, 2017

BARCHAN: Blob Alignment for Robust CHromatographic ANalysis (GCxGC)

In 1987, G. P. Barchan et al.  wrote a paper called: Gas chromatographic method of determining carbon monoxide and dioxideBut barkhan, or barchan, means more outside gas chromatography. It refers to sand dunes, with crescent shapes, modeled by the wind.

BARCHAN: crescent-shaped sand dunes

They inspired our image registration tool for comprehensive 2D chromatography peak alignment (GCxGC or GC2D). It was just published as BARCHAN: Blob Alignment for Robust CHromatographic ANalysis (GCxGC), Journal of Chromatography A, February 2017. For given 2D chromatogram areas of interest, a baseline removal (BEADS) is applied, a peak detection is performed, blobs are detected and registered with a mixed rigid/non-rigid transformation based on the Coherent Point Drift technique.

A pair of GCxGC chromatogram areas of interest.
BARCHAN registration, example.

The preprint is here. The HAL and the arxiv version. The abstract is next:

Abstract: (Comprehensive) Two dimensional gas chromatography (GCxGC) plays a central role into the elucidation of complex samples. The automation of the identification of peak areas is of prime interest to obtain a fast and repeatable analysis of chromatograms. To determine the concentration of compounds or pseudo-compounds, templates of blobs are defined and superimposed on a reference chromatogram. The templates then need to be modified when different chromatograms are recorded. In this study, we present a chromatogram and template alignment method based on peak registration called BARCHAN. Peaks are identified using a robust mathematical morphology tool. The alignment is performed by a probabilistic estimation of a rigid transformation along the first dimension, and a non-rigid transformation in the second dimension, taking into account noise, outliers and missing peaks in a fully automated way. Resulting aligned chromatograms and masks are presented on two datasets. The proposed algorithm proves to be fast and reliable. It significantly reduces the time to results for GCxGC analysis.

Highlights

• BARCHAN: 2D chromatogram and template alignment based on peak registration.
• The alignment is performed by probabilistic estimation with a Gaussian Mixture Model.
• It combines a rigid and a non-rigid transformation for complex samples analysis.
• The method accounts for noise, outliers and missing peaks in an automated way.
• BARCHAN significantly reduces the time to results for GC×GC analysis.

This work is a follow-up of preceding chromatography papers:
  1. Chromatogram baseline estimation and denoising using sparsity (BEADS)
  2. Comprehensive Two-Dimensional Gas Chromatography for Detailed Characterisation of Petroleum Products
  3. Characterization of middle-distillates by comprehensive two-dimensional gas chromatography (GCxGC): a powerful alternative for performing various standard analysis of minewddle distillates
  4. Comparison of conventional gas chromatography and comprehensive two-dimensional gas chromatography for the detailed analysis of petrochemical samples
A news-like description on IFP Energies nouvelles is provided:





December 22, 2016

Signal and image classification with invariant descriptors (scattering transforms): Internship

[Internship position closed]

Application and additional details

Description

The field of complex data analysis (data science) is interested in the extraction of suitable indicators used for dimension reduction, data comparison or classification. Initially based on application-dependent, physics-based descriptors or features, novel methods employ more generic and potentially multiscale descriptors, that can be used for machine learning or classification. Examples are to be found in SIFT-like (scale-invariant feature transform) techniques (ORB, SURF), in unsupervised or deep learning. The present internship focuses on the framework of scattering transform (S. Mallat et al.) and the associated classification techniques. It yields signal, image or graph representations with invariance properties relative to data-modifying transformations: translation, rotation, scale… Its performances have been well-studied for classical data (audio signals, image databases, handwritten digit recognition).

This internship aims at dealing with lesser studied data: identification of the closest match to a template image inside a database of underground models, extraction of suitable fingerprints from 1D spectrometric signals from complex chemical compounds for macroscopic chemometric property learning. 

The stake in the first case resides in the different scale and nature of template and model images, the latter being sketch- or cartoon-like versions of the templates. In the second case, signals are composed of a superposition of hundreds of (positive) peaks. Their nature differs from standard information processed by scattering transforms. A focus on one of the proposed applications can be considered, depending on success or difficulties met. 

References

December 17, 2016

Kultur Pop 36 : Rebirth

Le volume 36 (Rebirth) de Kultur Pop, compilations de génériques de Radio France, vient de paraître. Au programme :
Title Artist Track
Zapateado Opus 23 (Pablo de Sarasate) [France Culture, Culture matin] Itzhak Perlman & Sam Sanders 01
Quatre danseries : L' échappée [France Culture, Etat d'alerte (fin)] Jean-Philippe Goude 02
Changanya [France Culture, La matinale du samedi] Lakuta 03
La lune rousse [France Culture, Backstage] Fakear 04
Satta [France Culture, Notre époque] Boozoo Bajou 05
Siegfried [France Culture, Interlude nuits] Erik Truffaz 06
El condor pasa [France Culture, Paso doble, début] Paul Desmond 07
Soleil Rouge [France Culture, Interlude nuits] Jean-Louis Matinier 08

November 18, 2016

Recherche scientifique utilitaire ? Jean Perrin, et la méthode scientifique

Particulièrement ému de prononcer ces paroles dans un coin de ce jardin qu'aimait Marie Curie [...] je veux simplement tirer un enseignement, et vous monter par leur exemple comment toute nouveauté vraiment utile à l'homme ne peut être obtenue que par la découverte des choses inconnues poursuivies sans aucun préoccupation utilitaire. Ce n'est pas en désirant lutter contre le cancer que Marie Curie et Pierre Curie ont fait leurs immortelles découvertes [...]. Ainsi en tout domaine, pour acquérir de la puissance, pour diminuer ces corvées qu'il ne faut pas confondre avec un travail noble, pour faire reculer la vieillesse et la mort elle-même, pour briser enfin le cadre étroit où notre destin semblait à jamais enfermé, nous devons faciliter la recherche scientifique désintéressée. Vous tous qui allez m'écouter par dizaine de milliers, vous qui me voyez sans que je vous voie, entendez mon appel, et contribuez par toute votre influence à faciliter cette recherche conquérante qui fera le bonheur et la liberté des hommes.
Ce texte est lu par Jean Perrin, fondateur du CNRS, en 1938, dans le petit jardin de l'institut du radium. Il se trouve toujours non loin de l'Institut de biologie physico-chimique, rue Pierre et Marie Curie. On le retrouve à 2'30" dans le montage Jean Perrin et la réalité moléculaire à l'occasion du 40e anniversaire de la découverte du radium lors de la semaine internationale contre le cancer.

Cet extrait a été diffusé dans le cadre de l'émission La méthode scientifique, sur France Culture, avec Alain Fuchs : Quelle politique de la recherche en France ?


Son générique est sur Kultur Pop: Leonard Nimoy, Music to watch space girls by


October 31, 2016

CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems

XKCD. My hobby: extrapolating

CHOPtrey. A long, extrapolated and progressive scientific story. Ace, deuce, trey... ready?

It all started with Cyril Faure, then a PhD student  with Nicolas Pernet in real-time computing. He used to stop by my office. We had coffee, discussed books (from Descartes to Sci-Fi) and music (mostly Loudblast, Slayer, Pink Floyd, Manowar, Bolt Thrower). We exchanged ideas and concerns. One day, he told me about a worry in his thesis. Caveat: I am very bad at computer science, advanced programming, and had little hints about partitioned/slacked real-time co-simulation systems.

So this was not about programming, but simulation and co-simulation. Big physical systems (engines, aircrafts, boats) are complicated to simulate. Protocols and methods include FMI standard (Functional Mockup Interface) and FMU (Functional Mockup Unit). Partitioning them into subsystems may help the simulation, but split discrete subsystems should communicate. Fast enough to be precise, slow enough for speed-ups.
Hoher, 2011, A contribution to the real-time simulation...
So when simulated subsystems communicate at regular communication times, and one want interpolated data at a fractional time interval, one generally uses the last known value. This is called zeroth-order hold (ZOH). 
So I wondered: "this sound a little bit like aliasing, in a causal setting, let's interpolate, say, with a line or a parabola". No so easy. In this cosimulation domain, interpolation is "known" to be unstable. Even with FOH (first-order hold) or SOH (second-order hold).

A few implementations (Matlab prototypes, C++ and a final embedding in the XMOD software) and some tuning later, we produced a hierarchical scheme (with Abir, Cyril, Daniel and Mongi), based on morphological contexts, like in lossless image compression (PNG for instance). It proved very cheap, quite robust, and apparently stable. We called it CHOPtrey, from the old French words "ace, deuce, trey" (1, 2, 3 referring to sides of dice). In reference to a chop tray (or cutting board). Because it allows to chop (or cut) a big simulated system into several smaller ones. Because it is composed of three parts or chops:
  • CHOPred: a Computationally Hasty Online Prediction system (improving the trade-off between integration speed-ups, needing large communication steps, and simulation precision)
  • CHOPoly: Causal Hopping Oblivious Polynomials, with smoothed adaptive forward prediction improves co-simulation accuracy. They are similar to Savitzky-Golay filters or LOESS or LOWESS regression methods)
  • CHOPatt: a Contextual & Hierarchical Ontology of Patterns, where data behavior is segmented  into different classes to handle the discontinuities of the exchanged signals
During this work, I learned a lot about co-simulation, parallel computing, multi-core, etc. But also a lot about least-squares parabolic regression, which I thought I knew for decades. I did not. Not the deepest theory ever, but one of my nicest experience of cross-domain collaboration (along with bioinformatics and our BRANE work on gene regulatory networks), some genuine computer engineering, with a final packaged product (implemented in xMod). Here it is:
CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems 
Abstract : The growing complexity of Cyber-Physical Systems (CPS), together with increasingly available parallelism provided by multi-core chips, fosters the parallelization of simulation. Simulation speed-ups are expected from co-simulation and parallelization based on model splitting into weak-coupled sub-models, as for instance in the framework of Functional Mockup Interface (FMI). However, slackened synchronization between sub-models and their associated solvers running in parallel introduces integration errors, which must be kept inside acceptable bounds. CHOPtrey denotes a forecasting framework enhancing the performance of complex system co-simulation, with a trivalent articulation. First, we consider the framework of a Computationally Hasty Online Prediction system (CHOPred). It allows to improve the trade-off between integration speed-ups, needing large communication steps, and simulation precision, needing frequent updates for model inputs. Second, smoothed adaptive forward prediction improves co-simulation accuracy. It is obtained by past-weighted extrapolation based on Causal Hopping Oblivious Polynomials (CHOPoly). And third, signal behavior is segmented to handle the discontinuities of the exchanged signals: the segmentation is performed in a Contextual & Hierarchical Ontology of Patterns (CHOPatt). Implementation strategies and simulation results demonstrate the framework ability to adaptively relax data communication constraints beyond synchronization points which sensibly accelerate simulation. The CHOPtrey framework extends the range of applications of standard Lagrange-type methods, often deemed unstable. The embedding of predictions in lag-dependent smoothing and discontinuity handling demonstrates its practical efficiency.
Keywords: parallel simulation; Functional Mockup Interface (FMI); smoothed online prediction; causal polynomial extrapolation; context-based decision; internal combustion engine

Links:
The full set of numbers for the six sides of a die are ace, deuce, trey, cater, cinque, sice. They are from Old French (cf un, deux, trois, quatre, cinq, six of modern French). Ace is originally from the Latin for 'unit'.

June 12, 2016

Pêcheurs de perles, une parabole (in French, BEADS and CHOPtrey)

This post is about pearls found in apparently simple but yet complex industrial-type questions, and a handful of parabolas. Two practical applications are found in analytical chemistry with  BEADS: Baseline Estimation And Denoising w/ Sparsity and in cyber-physical system co-simulation with  CHOPtrey: Contextual Polynomial extrapolation for real-time forecasting. The whole stuff  is just a parabola, or a parable

I was not at my best level of confidence in this talk, even in French. I had to completely change the talk a couple of days before. Politics... The best dwells in the Dave Gilmour (or Pink Floyd) parts:
The talk, in French, parle de perles trouvées dans des questions à la fois simples et complexes, et d'une poignée de paraboles. Avec deux applications au filtrage de lignes de base en analyse physico-chimique, et pour la co-simulation de systèmes complexes avec des extrapolations polynomiales contextuelles. C'était à Maths en mouvement. The related works are:

Pêcheurs de perles, par Laurent Duval from Contact FSMP on Vimeo.

I just needed some parabolic relaxation.
Parabolic relaxation

April 30, 2016

Trainlets: cropped wavelet decomposition for high-dimensional learning

It's being a lonng time: element 120 from the aperiodic table of wavelets is the trainlet, from Jeremias Sulam, Student Member, Boaz Ophir, Michael Zibulevsky, and Michael Elad, Trainlets: Dictionary Learning in High Dimensions:
Abstract: Sparse representations has shown to be a very powerful model for real world signals, and has enabled the development of applications with notable performance. Combined with the ability to learn a dictionary from signal examples, sparsity-inspired algorithms are often achieving state-of-the-art results in a wide variety of tasks. Yet, these methods have traditionally been restricted to small dimensions mainly due to the computational constraints that the dictionary learning problem entails. In the context of image processing, this implies handling small image patches. In this work we show how to efficiently handle bigger dimensions and go beyond the small patches in sparsity-based signal and image processing methods. We build our approach based on a new cropped wavelet decomposition, which enables a multi-scale analysis with virtually no border effects. We then employ this as the base dictionary within a double sparsity model to enable the training of adaptive dictionaries. To cope with the increase of training data, while at the same time improving the training performance, we present an Online Sparse Dictionary Learning (OSDL) algorithm to train this model effectively, enabling it to handle millions of examples. This work shows that dictionary learning can be up-scaled to tackle a new level of signal dimensions, obtaining large adaptable atoms that we call trainlets.
They offer a base dictionary used within a double sparsity model to enable the training of adaptive dictionaries. The associated package is here, from Michael Elad software page.  The  cropped wavelet decomposition enables a multi-scale analysis with virtually no border effects. An entry  to trainlets has added to WITS, the aperiodic table of wavelets.

But things always ends up with a song! Two of my favorite train songs, by  Porcupine tree (Trains) and the Nits (The train).




April 24, 2016

M-band 2D dual-tree (Hilbert) wavelet multicomponent image denoising

The toolbox implements a parametric nonlinear estimator that generalizes several wavelet shrinkage denoising methods. Dedicated to additive Gaussian noise, it adopts a multivariate statistical approach to take into account both the spatial and the inter-component correlations existing between the different wavelet subbands, using a Stein Unbiased Risk Estimator (SURE) principle, which derives optimal parameters. The wavelet choice is a slightly redundant multi-band geometrical dual-wavelet frame. Experiments on multispectral remote sensing images outperform conventional wavelet denoising techniques (including curvelets). Since they are based on MIMO filter banks (multi-input, multi-ooutput), in a mullti-band  fashion,, we can called they MIMOlets quite safely. The dual-tree wavelet consists in two directional wavelet trees, diisplayed below for a 4-band filter:

4-band directional dual-tree wavelets

The set of wavelet functions implements:
The demonstration script is Init_Demo.m, and the functions for M-band dual-tree wavelets are provided in the directory TOOLBOX_DTMband_solo. For instance, the clean multispectral image (port of Tunis, only one channel):


The (very) noisy version:

The denoised one:








November 10, 2015

BRANE Cut: Biologically-Related Apriori Network Enhancement with Graph cuts

[BRANE Cut featured on RNA-Seq blog][Omic tools][bioRxiv preprint][PubMed/Biomed Central][BRANE Cut code]

Gene regulatory networks are somehow difficult to infer. This first work from an on-going work (termed BRANE *, for Biologically Related Apriori Netwok Enhancement) introduces an optimization method (based on Graph cuts, borrowed from computer vision/image processing) to infer graphs based on biologically-related a priori (including sparsity). It is succesfully tested on DREAM challenge data and an Escherichia coli network, with a specific work to derive optimization parameters from gene network cardinality properties. And it is quite fast.



Background: Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions.

Methods: Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge.
Results

Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6 % to 11 %). On a real Escherichia coli compendium, an improvement of 11.8 % compared to CLR and 3 % compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://​www-syscom.​univ-mlv.​fr/~pirayre/Codes-GRN-BRANE-cut.html.

Conclusions: BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the art GRN inference methods. It is applicable as a generic network inference post-processing, due to its computational efficiency.
Keywords:  Network inference, Reverse engineering, Discrete optimization, Graph cuts, Gene expression data, DREAM challenge.

Additional information of the BRANE Power page


September 19, 2015

Big data, fishes and cooking: fourteen shades of "V"

[At this short post, you can access the 14 "V" often glued to Bug Data, including vacuity]

To Lao Tzu is often attributed (I cannot access the original meaning):
Govern a great nation as you would cook a small fish. Do not overdo it.
Today's wisdom could be:
Deal with Big data as you would process a small signal. Do not over-expect from it, do not over-fit it, do not-overinterpret it.

Luckily, Big data does not exist, where Making The Most Of Small Data is advocated. This is a bit like teenage sex:
“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”


In the What exactly is Big Data StackExchange question, I have listed all 14 "V" that could describe big data, including... vacuity. They are: Validity, Value, Variability/Variance, Variety, Velocity, Veracity/Veraciousness, Viability, Virtuality, Visualization, Volatility, Volume and Vacuity.