December 26, 2014

The law of excessive gardening in education

Once upon a time. the term proletarian (or a Latin equivalent) used refer to  a Roman citizen who was so poor that he only had his children considered as his property. The term proles stands here for descendants (or litter). The term evolved through Marxism, to workers without capital or production means, who should sell their own work-force. The term can be somehow extended to the loss of knowledge of the production tools. An example lies in the comparison between the craftsman or artisan, who masters his tools, and may even be able to repair them, and manages a series of processes, as opposed to the factory worker, whose work has been taylorized and who lacks of knowledge in the whole meaning of the chain, or the functioning of the robots he "controls".

The concept of proletarianization has been used more recently by Bernard Stiegler (Ars Indutrialis), to describe a pauperization, not in terms of wealth, but in terms of ability to do, to make and to live. An externalization and therefore a loss of knowledge and memory. 

I believe proletarianization, in that sense, is rampant, and pervades all strata of the society: at every level of work, at least in institutions or companies, there is a loss of sense and understanding about the way stuff works. A corollary to Peter's principle. To make a long story short, strategic visions have progressively been replaced by indicator-based management. In science evaluation, the infamous impact factor or h-index are examples of such indicators. Stock-value and quality-based management are other examples. 
Peter's principle: (success>advancement)^n>failure

The top of proletarianization spread was given by Alan Greenspan, the former chairman of the Federal Reserve, who once told the CNBC that he did not fully understand the scope of the subprime mortgage market until well into 2005 and could not make sense of the complex derivative products created out of mortgages. Alan Greenspan, the chief banker of the world, was mystified...

So what's got the maths to do with it?

About two years ago, in April 2012, there was a conference called Fixing mathematical education. There i learned the effective law of excessive learning in mathematics from Alexandre V. Borovik:
To be able to use maths at certain level it is necessary to learn it at the next level
A week ago, Alexandre Borovik sent me a link to his preprint, Calling a spade a spade: Mathematics in the new pattern of division of labour:
The growing disconnection of the majority of population from mathematics is becoming a phenomenon that is increasingly difficult to ignore. This paper attempts to point to deeper roots of this cultural and social phenomenon. It concentrates on mathematics education, as the most important and better documented area of interaction of mathematics with the rest of human culture.
    I argue that new patterns of division of labour have dramatically changed the nature and role of mathematical skills needed for the labour force and correspondingly changed the place of mathematics in popular culture and in the mainstream education. The forces that drive these changes come from the tension between the ever deepening specialisation of labour and ever increasing length of specialised training required for jobs at the increasingly sharp cutting edge of technology.
    Unfortunately these deeper socio-economic origins of the current systemic crisis of mathematics education are not clearly spelt out, neither in cultural studies nor, even more worryingly, in the education policy discourse; at the best, they are only euphemistically hinted at.
    This paper is an attempt to describe the socio-economic landscape of mathematics education without resorting to euphemisms. 
The pdf is here. Alexandre claims that "The communist block was destroyed by a simple sentiment: If they think they pay me let them think I am working. Mathematics education in the West is being destroyed by a quiet thought (or even a subconscious impulse): If they think they teach me something useful, let them think I am learning." My experience as a part-time teacher is that it is increasingly difficult to have students learn things. Most of them are especially disgusted by the concept of "learning by heart" (wait, everything is in the Internet, isn't it?). While they could just try to understand. 

For a long time,  people could have proudly claimed that "I have never been good at mathematics, but I live happily without it". The XXth century has been rich in discoveries that have both pervaded the society (basically, from the transistor to the mp3). The pitfall is many users have no idea about the way these technologies work, and happily put their agenda, thoughts, entertainment time and finally life (pictures, films) in those hands. 

The world seemingly grows a local era where data becomes important (a new gold, a novel oil?) and big data or data science are slowly emerging as potentially disruptive technologies. Then, Alexandre states that "the West is losing the ability to produce competitively educated workers for mathematically intensive industries". It is high times to rethink (mathematical) education, as its changes may be much slower than the present evolution of technologies. 

Nota bene: and Igor, just for you, there is a phase transition ("The crystallisation of a mathematical concept (say, of a fraction), in a child's mind could be like a phase transition in a crystal") and an aha moment ("An aha! moment is a sudden jump to another level of abstraction")

December 20, 2014

Learning meets compression: small-data-science internship (IFPEN)

Internship subject: [french/english]
Many experimental designs acquire continuous or salve signals or images. Those are characteristic of a specific phenomenon. One may find examples at IFPEN in seismic data/images, NDT/NDE acoustic emissions (corrosion, battery diagnosis) engine benches (cylinder pressure data, fast camera), high-thoughput screening in chemistry. Very often, such data is analyzed with standardized, a priori indices. Comparisons between different experiments (difference- or classification-based) are often based on the same indices, without resorting to initial measurements.

The increasing data volume, the variability in sensor and sampling, the possibility of different pre-processing yield two problems: the management and access to data (« big data ») and their optimal exploitation by dimension reduction methods, supervised or unsupervised learning (« data science »). This project aims at the analysis of the possibility of a joint compressed representation of data and the extraction of pertinent indicators, at different characteristic scales, and the relative impact of the first aspect (lossy compression degradation) over the second aspect (precision and robustness of extracted feature indicators).

The internship possesses a dual goal. The first aspect will be dealing with scientific research on sparse signal/image representations with convolution networks based on multiscale wavelet techniques, called scattering networks. Their descriptors (or footprints) possess fine translation, rotation and scale invariance. Those descriptors will be employed for classification and detection. The second aspect will carry on the impact of lossy compression on the preceeding results, and the development of novel sparse representations for joint compression and learning.

J. Bruna, S. Mallat. Invariant scattering convolution networks. IEEE Trans. on Patt. Anal. and Mach. Int., 2010
L. Jacques, L. Duval, C. Chaux, G. Peyré, A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity, Signal Processing, 2011
C. Couprie, C. Farabet, L. Najman, Y. LeCun, Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos, Journal of Machine Learning Research, 2014

A PhD thesis (Characteristic fingerprint computation and storage for high-throughput data flows and their on-line analysis, with J.-C. Pesquet, Univ. Paris-Est) is proposed, starting September 2015.

Information update:

November 29, 2014

Haiku (libre) : sémantique et général

La trahison des images, René Magritte
Les gens qui confondent
La carte et le territoire
Me fatiguent un peu

[En ce moment, une belle exposition Magritte thématique est à voir au centre national d'art et de culture Beaubourg-Georges Pompidou] 

Big data and Data science: LIX colloquium 2014

Sketch of the Hype Cycle for Emerging Technologies
Data science and Big data are two concepts at the tip of the tongue and the top of the Gartner Hype Cycle for Emerging Technologies. Close to the peak of inflated expectations. The Data science LIX colloquium 2014 at Ecole Polytechnique, organized by Michalis Vazirgiannis from DaSciM was held yesterday on the Plateau de Saclay, which may have prevented some to attend the event. Fortunately, it was webcast.

The talks covered a wide range of topics pertaining to Datacy (data + literacy). The community detection in graphs (with survey) keynote promoted local optimization (OSLOM, with order statistics). It was said than "We should define validation procedures even before starting developing algorithms", including negative tests; on random graphs, a clustering method should find non prominent cluster (except the whole graph), in other words no signal in noise. But there was no mention to phase transition in clustering. The variety of text data (SMS, tweets, chats, emails, news web pages, books, 100 languages, OCR and spelling mistakes) and its veracity was questioned with Facebook estimating that between 5% and 11% of accounts are fake, and 68.8 percent of the Internet is spam (how did they get the 3 figures precision?). News-hungry people would be interested in EMM News, a media watch tool aggregating 10000 RSS flux and 4000 news aggregators. With all these sources, some communities are concerned with virtual ghost town effects, and look for way to spark discussions (retweets and the likes) to keep social activity alive. Flat approaches or hierarchical grouping are still debated challenges in large-scale classification and web-scale taxonomies. Potentially novel graph structures (hypernode graphs, related to hypergraphs or maybe n-polygraphs) with convex stability and spectral theory are also proposed in the first part of the colloquium.

Big Data Cap Gap: the space between all and relevant data
While Paris-Saclay center for data science has opened its website, the unbalanced data was exposed around the HiggsML data-driven challenge. Less than 100 Higgs bosons (expected) to be detected in 10^10 yearly. Big-data analogs of the greek Pythia, as well as efficient indexing and mining methods would necessary to harness the data beast. More industrials talks concluded the colloquium, given by AXA, Amazon and Google representatives, which i could not attend, left with the so-called "crap gap" in mind, i. e. the gap between Relevant Data and Big Data. 

Innovation driven by large data sets still requires, at least, vague goals in mind. In Latin, "Ignoranti quem portum petat nullus suus uentus (ventus) est", wrote Sénèque in his 71th letter to Lucilius. A possible translation in English: "When a man does not know what harbour he is making for, no wind is the right wind". In German, "Dem weht kein Wind, der keinen Hafen hat, nach dem er segelt". And "Il n'y a point de vent favorable pour celui qui ne sait dans quel port il veut arriver" in French.

All of the information, and possibly information you need, may be found in the following program and videos. As the videos are not split into talks, the time codes are provided, thanks to the excellent suggestion (and typos corrections) by Igor Carron.

LIX colloquium 2014 on Data Science LIVE part 1
  • 00:00:00 > 00:22:22: Introduction and program
  • 00:22:22 > 01:22:18: Keynote speech: Community detection in networks, Santo Fortunato, Aalto University
  • 01:22:18 > 01:57:30: Text and Big Data, Gregory Grefenstette, Inria Saclay - Île de France
  • 01:57:30 > 02:29:23: Accessing Information in Large Document Collections: classification in web-scale taxonomies, Eric Gaussier, Université Joseph Fourier (Grenoble I)
  • 02:29:23 > 03:01:32: Shaping Social Activity by Incentivizing Users, Manuel Gomez Rodriguez, Max Planck Institute for Software Systems
  • 03:01:32 > 03:38:00: Machine Learning on Graphs and Beyond, Marc Tommasi, Inria Lille

LIX colloquium 2014 on Data Science LIVE part 2
  • 00:00:00 > 00:33:57: Learning to discover: data science in high-energy physics and the HiggsML challenge, Balázs Kégl, CNRS
  • 00:34:11 > 01:06:15: Big Data on Big Systems: Busting a Few Myths, Peter Triantafillou, University of Glasgow
  • 01:06:15 > 01:38:29: Big Sequence Management, Themis Palpanas, Paris Descartes University

LIX colloquium 2014 on Data Science LIVE part 3 
  • 00:00:00 > 00:38:11: Understanding Videos at YouTube Scale, Richard Washington, Google
  • 00:38:11 > 01:05:48: AWS's big data/HPC innovations, Stephan Hadinger, Amazon Web Services
  • 01:05:48 > 02:02:41: Big Data in Insurance - Evolution or disruption? Stéphane Guinet, AXA 
  • 02:02:41 > 02:06:35: Closing words on a word cloud (with time, series, graph and classification are the big four)

November 1, 2014

Cédric Villani : les mathématiques sont un art comme les autres (podcast)

Les mathématiques sont un art comme les autres, une série de cinq entretiens avec Cédric Villani (professeur de l'Université de Lyon et directeur de l'Institut Henri Poincaré), dans l'émission "Un autre jour est possible", sur France Culture. Sur la poésie, la musique, le design, les arts de la rue et le cinéma. Cette tête chercheuse fait beaucoup et bien pour la vulgarisation des mathématiques et leur transfert innovation vers des disciplines afférentes. Louable effort. "Nul ne peut être mathématicien s'il n'a une âme de poète", disait Sophie Kowalevskaia. Les 15 et 16 décembre 2014, le forum Horizon Maths a lieu à IFP Energies nouvelles à Rueil-Malmaison (thème : "Les mathématiques se dévoilent aux industriels"), le programme en pdf est ici.En plus détaillé, après la pause podcast sur Cédric Villani.

Session « Méthodes pour la chimie ab initio »
  • Pascal Raybaud (IFPEN) : « Enjeux de la performance numérique pour les calculs ab initio
  • en catalyse »
  • Thierry Deutsch (CEA Grenoble) « Les ondelettes, une base flexible permettant un contrôle fin de la précision et la mise au point des méthodes ordre N pour le calcul de la structure électronique via BigDFT »
  • Benjamin Stamm (UPMC) « A posteriori estimation for non-linear eigenvalue problems in the context of DFT- methods »
  • Filippo Lipparini (Universität Mainz) « Large, polarizable QM/MM/Continuum computations : ancient wishes and recent advances »Eric Cancès (Ecole des Ponts ParisTech) « Aspects mathématiques de la théorie fonctionnelle de la densité (DFT) »
Session « Optimisation sans dérivée »
  • Delphine Sinoquet (IFPEN) « Applications de l’optimisation sans dérivée dans le secteur pétrolier et le domaine des énergies marines renouvelables »
  • Emmanuel Vazquez (SUPELEC) « Nouvelles fonctions de perte pour l’optimisation bayesienne »
  • Serge Gratton (CERFACS) « Optimisation sans dérivée : algorithmes stochastiques et complexité »
  • Wim van Ackooij (EDF) « Optimisation sous contraintes probabilistes et applications en management d’énergies »
  • Marc Schoenauer (INRIA) « Optimisation continue à base de comparaisons : modèles de substitution et adaptation automatique »
  • Bilan de la session par Josselin Garnier (Université Paris Diderot)
Session « Maillages et Applications Industrielles »
  • Jean-Marc Daniel (IFPEN) « Besoins pour le maillage des milieux géologiques complexes »
  • Paul-Louis George et Houman Borouchaki (INRIA et UTT - INRIA respectivement) « Panorama des méthodes génériques de génération de maillages et méthodes spécifiques de maillage en géosciences »
  • Jean-François Remacle (UCL - Rice University) « An indirect approach to hex mesh generation »
  • Thierry Coupez (Ecole Centrale de Nantes) « Frontières implicites et adaptation anisotrope de maillage »
  • Pascal Tremblay (Michelin) « Les défis de la transition du maillage hexaédrique vers tétraédrique pour des applications industrielles »
  • Bilan de la session par Frédéric Hecht (UPMC)
Session « Visualisation »
  • Sébastien Schneider (IFPEN) « Courte introduction à la visualisation pour les géosciences à IFPEN »
  • Julien Jomier (Kitware) « Scientific Visualization with Open-Source Tools »
  • Emilie Chouzenoux (UPEM) « A Random block- coordinate primal-dual proximal algorithm with application to 3D mesh denoising »
  • Jean-Daniel Fekete (INRIA) « Visualisation de réseaux par matrices d’adjacence »
  • Marc Antonini (CNRS) « Compression et visualisation de données 3D massives »
  • Bilan de la session par Julien Tierny (CNRS - UPMC)

October 31, 2014

BEADS: Baseline Estimation And Denoising w/ Sparsity

BEADS paper : Baseline Estimation And Denoising w/ Sparsity
BEADS Matlab toolbox
BEADS Baseline toolbox at MatlabCentral
BEADS page: references, toolboxes and uses

Most signals and images can be split into broad classes of morphological features. There are five traditional classes, with potentially different names, although the dams are not fully waterproof:
  • smooth parts: trends, backgrounds, wanders, continuums, biases, drifts or baselines,
  • ruptures, discontinuities: contours, edges or jumps,
  • harmonic parts: oscillations, resonances, geometric textures,
  • hills: bumps, blobs or peaks,
  • noises: un-modeled, more or less random, unstructured or stochastic.
In analytical chemistry,  many types of signals (chromatography, mass spectroscopy, Raman, NMR) resemble Fourier spectra: a quantity of positive bump-like peaks, representing the proportion of chemical compounds or atoms, over an instrumental baseline, with noise.

Individual  isolated peaks

Superimposed peaks

Sum of individual peaks
The present work (termed BEADS) combines a Baseline Estimation And Denoising. It features the use of the Sparsity of the peaks themselves, but also that of their higher-order derivatives. It also enforces positivity, with sparsity-promoting asymmetric L1 norms, or regularization thereof. A first version of the BEADS Matlab toolbox is provided. 
1D chromatogram with increasing baseline

Chemometrics and Intelligent Laboratory Systems, December 2014
This paper jointly addresses the problems of chromatogram baseline correction and noise reduction. The proposed approach is based on modeling the series of chromatogram peaks as sparse with sparse derivatives, and on modeling the baseline as a low-pass signal. A convex optimization problem is formulated so as to encapsulate these non-parametric models. To account for the positivity of chromatogram peaks, an asymmetric penalty function is utilized. A robust, computationally efficient, iterative algorithm is developed that is guaranteed to converge to the unique optimal solution. The approach, termed Baseline Estimation And Denoising with Sparsity (BEADS), is evaluated and compared with two state-of-the-art methods [Vincent Mazet's BackCor and airPLS] using both simulated and real chromatogram data, with Gaussian and Poisson noises
Asymmetric L1 penalty for positive signals
Keywords: baseline correction; baseline drift; sparse derivative; asymmetric penalty; low-pass filtering; convex optimization
  • This paper jointly addresses the problems of chromatogram baseline correction and noise reduction.
  • The series of chromatogram peaks are modeled as sparse with sparse derivatives.
  • The baseline is modeled as a low-pass signal.
  • A convex optimization problem is formulated so as to encapsulate these non-parametric models and a computationally efficient, iterative algorithm is developed.
  • The performance is evaluated and compared with two state-of-the-art methods using both simulated and real chromatogram data.
I would like to thank Vincent Mazet for excellent suggestions.

September 29, 2014

Geophysics: Taking signal and noise to new dimensions

The journal (from SEG: Society of Exploration Geophysicists) Geophysics (SCImago journal ranking) has issued a call for papers for a special issue devoted to signal processing ("Taking signal and noise to new dimensions"), deadline end January 2015.
Original seismic stack
Large Gaussian noise corruption
Denoised with dual-tree wavelets

Taking signal and noise to new dimensions

The inherent complexity of seismic data has sparked, since about half a century, the development of innovative techniques to separate signal and noise. Localized time-scale representations (e. g. wavelets), parsimonious deconvolution, sparsity-promoted restoration and reconstruction are now at the core of modern signal processing and image analysis algorithms. Together with advances from computer science and machine learning, they shaped the field of data science, aiming at retrieving the inside structure of feature-rich, complex and high-dimensional datasets. This special issue is devoted to novel methodologies and strategies capable of tackling the large data volumes necessary to harness the future of subsurface exploration. A common trait resides in the possibility to seize at the same time both signal and noise properties along lower dimensional spaces, with dedicated metrics, to allow their joint use for seismic information enhancement. The traditional frontier between signal and noise is dimming, as incoherent seismic perturbations and formerly detrimental coherent wavefields, such as multiple reflections, are nowadays recognized as additional information for seismic processing, imaging and interpretation. We welcome contribution pertaining (but not limited) to:
  • emerging seismic data acquisition and management technologies
  • more compact, sparser and optimized multidimensional seismic data representations
  • filtering, noise attenuation, signal enhancement and source separation
  • advanced optimization methods and related metrics, regularizations and penalizations
  • artificial intelligence and machine learning applications for seismic data characterization
  • novel applications of signal and image processing to geophysics
  • hardware and software algorithmic breakthroughs
Timeline (tentative) for the Geophysics call of papers
  • Submission deadline: 31 Jan 2015
  • Peer review complete: 10 July 2015
  • All files submitted for production: 15 August 2015
  • Publication of issue: November-December 2015

Gravity survey @ LandTech

    September 13, 2014

    Cours : Radial basis functions

    Un message d'Albert Cohen annonce un mini-cours : "Interpolation et Quasi-Interpolation utilisant les fonctions radiales comme méthode d'approximation de plusieurs variables"
    Le professeur Martin Buhmann (Université de Justus-Liebig, Giessen), de l’Université de Giessen, fera le lundi 22 septembre au laboratoire J.L. Lions un mini-cours de 2h sur les bases de fonction radiales. Ces outils sont fréquement utilisés en approximation multivariée, en particulier en grande dimension, en statistiques (méthodes à noyaux), et parfois dans le traitement de l’image et l’approximation des EDP. Martin Buhmann est un expert reconnu dans ce domaine. Le cours aura lieu de 16:00 à 18:30 en salle 309 (salle de séminaire du LJLL) couloir 15-16, 3ème étage, Jussieu. 
    Radial version of the Laplacian of a Gaussian wavelet

    Le résumé (en français !)
    Les méthodes utilisant les fonctions radiales sont des façons d'approcher une fonction par une combinaison linéaire (finie ou infinie) de translatées d'une unique fonction, appelée fonction noyau. Cette fonction peut avoir, par exemple, la forme d'une exponentielle (noyau de Gauss ou de Poisson). Les coefficients de cette combinaison linéaire étant choisis, par exemple, en fonction des conditions d'interpolation. Beaucoup de propriétés typiques de l'approximation par des noyaux de type fonctions radiales proviennent de la symétrie radiale de ces noyaux. Les avantages de cette méthode liée aux splines à une dimension sont, d'une part sa généralisation naturelle à une dimension quelconque (les fonctions noyaux étant générées à partir d'une fonction d'une variable multidimensionnelle composée avec une norme – lorsque la norme est euclidienne on parle de fonctions radiales) et d'autre part, ses propriétés de convergence très rapide si les fonctions approximées sont assez lisses (souvent en convergence spectrale). De plus, en utilisant un grand choix de fonctions radiales, le problème d'interpolation est bien défini avec une unique solution indépendante de la dimension de l'espace et de la distribution des points d'interpolation. Cette situation optimale serait impossible par exemple dans le cas des polynômes en plusieurs dimensions. Entre autres les noyaux de Gauss, les noyaux multiquadriques, inverse multiquadriques, les noyaux de Poisson, ..., ont cette propriété intéressante qui permet de nombreuses applications. Dans ces deux exposés nous introduirons le concept de fonction radiale, nous présenterons des propriétés de ces fonctions approximantes et nous détaillerons les théorèmes de convergence qui montrent la puissance des méthodes d'approximation utilisant cette idée.

    Quelques pistes :
    Yafer Abu-Mostafa, Learning from data, introductory machine learning course, Caltech, 2012, lecture 16, Radial Basis Functions

    M. D. Buhmann : Radial basis function, Scholarpedia, 2010
    M. D. Buhmann : Radial Basis Functions: Theory and Implementations, 2003, Cambridge University Press
    J. B. Cherrie, R. K. Beatson, G. N.  Newsam,  Fast evaluation of radial basis functions: methods for generalised multiquadrics in R^n, SIAM Journal on Scientific Computation, 2002
    M. D. Buhmann : Radial basis functions, Acta Numerica, 2000
    Radial basis function methods are modern ways to approximate multivariate functions, especially in the absence of grid data. They have been known, tested and analysed for several years now and many positive properties have been identi ed. This paper gives a selective but up-to-date survey of several recent developments that explains their usefulness from the theoretical point of view and contributes useful new classes of radial basis function. We consider particularly the new results on convergence rates of interpolation with radial basis functions, as well as some of the various achievements on approximation on spheres, and the e cient numerical computation of interpolants for very large sets of data. Several examples of useful applications are stated at the end of the paper.
    Mark J. L. Orr, Introduction to Radial Basis Function Networks, April 1996
    D. H. Broomhead,  D. Lowe, Multivariable Functional Interpolation and Adaptive Networks, Complex Systems, 1988

    July 24, 2014

    Euclid in a Taxicab makes some SOOT : A smoothed l_1/l_2 norm ratio for sparsity enforcement and blind restoration

    [This post deals with a smoothed l1/l2 norm ratio as a proxy for sparsity (called continuous numerical sparsity in Discrete uncertainty principles and sparse signal processing by Afonso Bandeira, Megan Lewis, Dustin Mixon, 2015), applied to blind signal deconvolution with an example on seismic data]
    Matlab toolbox: SOOT Sparse Blind deconvolution toolbox or

    Smoothed $\ell_1/\ell_2$ norm ratio for $\ell_0$ approximation (paper fortune teller/cootie catcher)

    There are Taxicab services in the city of Euclid, OH, near Cleveland. There is a first Euclid Avenue, in Cleveland, was a beautiful and wealthy city a century ago, with a string of mansions known as Millionaire's Row.

    According to wikipedia, "Euclid Avenue is a street name used in multiple U.S. municipalities. Surveyors frequently named a street after Euclid (or Euclides) the father of geometry as it is the basis of their profession". 

    I wonder why i have seen so far so few "rue Euclide" in France, close to none. As least, no Euclid Street in Paris, as you can check from Mathematicians with Paris streets named after them.

    Euclid Avenue Station directs subway lines A and C from Brooklyn to Manhattan. Can one really draw a line between Euclid and Manhattan? You can use the Taxicab geometry, linked to the Manhattan distance (in red, blue or green lines), in the absolute value sense, or l_1 (ell-one norm or one-norm). Or you can do like Dutilleul (garou-garou), the "Passe-muraille" by Marcel Aimé, or J'onn J'onzz the Martian Manhunter, and cross through the wall, in the Euclidean way. With the square root of sums of squares, or l_2 (ell-two norm or simply two-norm).

    Most of the standard data analysis or signal processing tools we use in every day life are based on a Euclidean minimization of the l_2 norm, from the mean to the pseudo-inverse, via the Fourier transform. Not because it is important to minimize energies, just because the derivative of a square, i.e. a linear system, was for long the only kind of systems we could solve. In the case of outliers, the l_1-norm was shown to exhibit some robustness, hence the median estimator, and its descendants, the l_1 or robust-PCA, or even robust Fourier or time-frequency transforms. Many recents efficient algorithms have been proposed to obtain these transforms or estimators. L^1 also possesses some nice geometry, quite often attributed to Herman Minkowski. For a primer, see The nature of length, area, and volume in taxicab geometry, Kevin P. Thompson, International Electronic Journal of Geometry, Volume 4 No. 2 pp. 193-207 (2011)

    But the l1 is not enough in higher dimensional spaces, to capture the low-dimension structure of sparse data. One need the counting l_0 distance, the count of non-zero elements, or numerosity (the Hamming distance to the null vector). Alas, the l0 distance is barely differentiable, not even quite smooth. Hard to optimize. Luckily, under some conditions, solving a system under  a nicer l_1 norm sometimes yields the sparsest solution to the system. But not always, as shown in Minimum l_1-norm solutions are not always sparse

    Over the past years, several measures have been proposed to better measure or approximate the sparsity measure of a signal or an image. Taking into account the fact that the l_0 measure is homogeneous of degree 0, while standard norms are homogeneous of degree 1. One of those is simply the ratio of the l_1 and the l_2 norms for vectors of dimension n, which enjoys a standard inequality:

    l_1 and l_2 standard inequality
    With an appropriately scaled ratio of norms (through a simple affine transformation), one obtains a parsimony measure or a sparsity-sparseness index between 0 and 1 (also knwon as Hoyer sparsity measure):

    l_1 l_2 ratio Hoyer sparsity measure
    Yet it does not lend itself to easy optimization (due to nonconvex optimization), although several works have studies such a norm ratio. Some of them were publicized in Nuit Blanche:
    All this long zigzag path introduces a paper on a smoothing of the nonconvex l_1/l_2 norm ratio in a parametrized form and regularized norm formulation, to put Euclid in a Taxicab. The corresponding algorithm, termed SOOT for "Smoothed-One-Over-Two" norm ratio, has results on its theoretical convergence. Not used here, the ratio of the l1-norm to the l2-norm, when restricted to subspaces of given dimension, is provided  with a lower bound given by the Kashin-Garnaev-Gluskin Inequality.

    It is applied to sparse blind deconvolution (or deblurring), here for an example of sparse seismic data processing. In the same way the l_1 regularization was present in a 1973 geophysical paper by Muir and Claerbout, the l_1/l_2 norm ratio idea can be traced back to technical reports by Gray. It involves a logarithm, whose effect, in some way, is close to the l_0 index. Indeed, the logarithm is somehow a continuous limit, close to the discrete zeroth-power.

    Seismological reflectivity sequence and seismic wavelet recovery with sparse blind deconvolution

    The $\ell_1/ell_2$ ratio regularization function has shown good performance for retrieving sparse signals in a number of recent works, in the context of blind deconvolution. Indeed, it benefits from a scale invariance property much desirable in the blind context.However, the $\ell_1/ell_2$ function raises some difficulties when solving the nonconvex and nonsmooth minimization problems resulting from the use of such regularization penalties in current restoration methods.In this paper, we propose a new penalty based on a smooth approximation to the $\ell_1/ell_2$ function. In addition, we develop a proximal-based algorithm to solve variational problems involving this function and we derive theoretical convergence results. We demonstrate the effectiveness of our method through a comparison with a recent alternating optimization strategy dealing with the exact $\ell_1/ell_2$ term, on an application to seismic data blind deconvolution.
    [arXiv Link]
    [Nuit Blanche link]

    And oh, i forgot to tell you (princess Irulan, Dune introduction): there is a path between Euclid and Hardy (and Littlewood), in The Development of Prime Number Theory: From Euclid to Hardy and Littlewood. And there was a famous taxicab connection between Hardy and Ramanujan, the Hardy-Ramanujan number, or "Is 1729 a dull number?"
    Waiting for taxicab number 1729

    June 3, 2014

    Seismic Signal Processing (ICASSP 2014)

    There was a special session on "Seismic Signal Processing" at International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, Florence. Our talk was on simplified optimization techniques to solve multiple reflections via adaptive filtering techniques in wavelet frame domains.

    Random and structured noise both affect seismic data, hiding the reflections of interest (primaries) that carry meaningful geophysical interpretation. When the structured noise is composed of multiple reflections, its adaptive cancellation is obtained through time-varying filtering, compensating inaccuracies in given approximate templates. The under-determined problem can then be formulated as a convex optimization one, providing estimates of both filters and primaries. Within this framework, the criterion to be minimized mainly consists of two parts: a data fidelity term and hard constraints modelling a priori information. This formulation may avoid, or at least facilitate, some parameter determination tasks, usually difficult to perform in inverse problems. Not only classical constraints, such as sparsity, are considered here, but also constraints expressed through hyperplanes, onto which the projection is easy to compute. The latter constraints lead to improved performance by further constraining the space of geophysically sound solutions.
    This paper  has focused on the constrained convex formulation of adaptive multiple removal. The proposed approach, based on proximal methods, is quite flexible and allows us to integrate a large panel of hard constraints corresponding to a priori knowledge on the data to be estimated (i.e. primary signal and time-varying filters). A key observation is that some of the related constraint sets can be expressed through hyperplanes, which are not only more convenient to design, but also easier to implement through straightforward projections. Since sparsifying transforms and  constraints strongly interact [Pham-2014-TSP], we now  study the class of hyperplane constraints of interest as well as their inner parameters, together with the extension to higher dimensions

    May 30, 2014

    Sparse template-based adaptive filtering

    Significance index related to Student's t-test
    The phenomenon arises in several real-life signal processing contexts: acoustic echo-cancellation (AEC) in sound and speech,  non-destructive testing where transmitted waves may rebound at material interfaces (e.g. ultrasounds), or pattern matching in images. Here in seismic reflection or seismology. Weak signals (of interest) are buried under both strong random and structured noise. Provided appropriate templates are obtained, we propose a structured-pattern filtering algorithm (called Ricochet) through constrained adaptive filtering in a  transformed domain. Its generic methodology impose sparsity: in different wavelet frames (Haar, Daubechies, Symmlets) coefficients, using the L-1 or Manhattan norm, as well as on adaptive filter coefficients using concentration measures (for sparser filters in the time domain): L-1, the Frobenius norm squared, and the mixed L-1,2 norms). Regularity properties are constrained as well, for instance slow variation on the adaptive filter coefficients (uniform, Chebychev or L-infinity norm). Quantitative results are given with a significance index, reminiscent of the Student t-test.

    Abstract: Unveiling meaningful geophysical information from seismic data requires to deal with both random and structured ``noises''. As their amplitude may be greater than signals of interest
    Seismic data: primaries and multiples
    Lost in multiples: a creeping primary (flat, bootom-right)
    (primaries), additional prior information is especially important in performing efficient signal separation. We address here the problem of multiple reflections, caused by  wave-field bouncing between layers. Since only approximate models of these phenomena are available, we propose a flexible framework for time-varying adaptive filtering of seismic signals, using sparse representations,  based on inaccurate templates. We recast the joint estimation of adaptive filters and primaries in a new convex variational formulation. This approach allows us to incorporate plausible knowledge about noise statistics, data sparsity and slow filter variation in parsimony-promoting wavelet frames.  The designed primal-dual algorithm solves a  constrained  minimization problem that alleviates standard regularization issues in finding hyper-parameters. The approach demonstrates  significantly good performance in low signal-to-noise ratio conditions, both for simulated and real field seismic data.

    All the metrics here are convex. Wait a bit for something completely different with non-convex penalties, namely smoothed versions of the ratio of the L1 norm over the L2 norm: Euclid in a Taxicab: Sparse Blind Deconvolution with Smoothed $\ell_1/ell_2$ Regularization, covered in Nuit Blanche, and with arxiv page and pdf.

    SPOQ: norm ratio sparsity restoration

    So SPOQ means, in Swedish, "Svenskt Pediatriskt Ortopediskt Qvalitetsregister "  (Swedish Pediatric  Orthopaedic  Quality regist...