## July 24, 2014

### Euclid in a Taxicab makes some SOOT : A smoothed l_1/l_2 norm ratio for sparsity enforcement and blind restoration

[This post deals with a smoothed l1/l2 norm ratio as a proxy for sparsity (called continuous numerical sparsity in Discrete uncertainty principles and sparse signal processing by Afonso Bandeira, Megan Lewis, Dustin Mixon, 2015), applied to blind signal deconvolution with an example on seismic data]
Matlab toolbox: SOOT Sparse Blind deconvolution toolbox or http://lc.cx/soot

 Smoothed $\ell_1/\ell_2$ norm ratio for $\ell_0$ approximation (paper fortune teller/cootie catcher)

There are Taxicab services in the city of Euclid, OH, near Cleveland. There is a first Euclid Avenue, in Cleveland, was a beautiful and wealthy city a century ago, with a string of mansions known as Millionaire's Row.

According to wikipedia, "Euclid Avenue is a street name used in multiple U.S. municipalities. Surveyors frequently named a street after Euclid (or Euclides) the father of geometry as it is the basis of their profession".

I wonder why i have seen so far so few "rue Euclide" in France, close to none. As least, no Euclid Street in Paris, as you can check from Mathematicians with Paris streets named after them.

Euclid Avenue Station directs subway lines A and C from Brooklyn to Manhattan. Can one really draw a line between Euclid and Manhattan? You can use the Taxicab geometry, linked to the Manhattan distance (in red, blue or green lines), in the absolute value sense, or l_1 (ell-one norm or one-norm). Or you can do like Dutilleul (garou-garou), the "Passe-muraille" by Marcel Aimé, or J'onn J'onzz the Martian Manhunter, and cross through the wall, in the Euclidean way. With the square root of sums of squares, or l_2 (ell-two norm or simply two-norm).

Most of the standard data analysis or signal processing tools we use in every day life are based on a Euclidean minimization of the l_2 norm, from the mean to the pseudo-inverse, via the Fourier transform. Not because it is important to minimize energies, just because the derivative of a square, i.e. a linear system, was for long the only kind of systems we could solve. In the case of outliers, the l_1-norm was shown to exhibit some robustness, hence the median estimator, and its descendants, the l_1 or robust-PCA, or even robust Fourier or time-frequency transforms. Many recents efficient algorithms have been proposed to obtain these transforms or estimators. L^1 also possesses some nice geometry, quite often attributed to Herman Minkowski. For a primer, see The nature of length, area, and volume in taxicab geometry, Kevin P. Thompson, International Electronic Journal of Geometry, Volume 4 No. 2 pp. 193-207 (2011)

But the l1 is not enough in higher dimensional spaces, to capture the low-dimension structure of sparse data. One need the counting l_0 distance, the count of non-zero elements, or numerosity (the Hamming distance to the null vector). Alas, the l0 distance is barely differentiable, not even quite smooth. Hard to optimize. Luckily, under some conditions, solving a system under  a nicer l_1 norm sometimes yields the sparsest solution to the system. But not always, as shown in Minimum l_1-norm solutions are not always sparse

Over the past years, several measures have been proposed to better measure or approximate the sparsity measure of a signal or an image. Taking into account the fact that the l_0 measure is homogeneous of degree 0, while standard norms are homogeneous of degree 1. One of those is simply the ratio of the l_1 and the l_2 norms for vectors of dimension n, which enjoys a standard inequality:

With an appropriately scaled ratio of norms (through a simple affine transformation), one obtains a parsimony measure or a sparsity-sparseness index between 0 and 1 (also knwon as Hoyer sparsity measure):

Yet it does not lend itself to easy optimization (due to nonconvex optimization), although several works have studies such a norm ratio. Some of them were publicized in Nuit Blanche:
http://nuit-blanche.blogspot.fr/2012/04/compressive-sensing-this-week.html
http://nuit-blanche.blogspot.fr/2012/03/old-and-new-algorithm-for-blind.html
http://nuit-blanche.blogspot.fr/2008/05/cs-kgg-explanation-of-l0-and-l1.html
All this long zigzag path introduces a paper on a smoothing of the nonconvex l_1/l_2 norm ratio in a parametrized form and regularized norm formulation, to put Euclid in a Taxicab. The corresponding algorithm, termed SOOT for "Smoothed-One-Over-Two" norm ratio, has results on its theoretical convergence. Not used here, the ratio of the l1-norm to the l2-norm, when restricted to subspaces of given dimension, is provided  with a lower bound given by the Kashin-Garnaev-Gluskin Inequality.

It is applied to sparse blind deconvolution (or deblurring), here for an example of sparse seismic data processing. In the same way the l_1 regularization was present in a 1973 geophysical paper by Muir and Claerbout, the l_1/l_2 norm ratio idea can be traced back to technical reports by Gray. It involves a logarithm, whose effect, in some way, is close to the l_0 index. Indeed, the logarithm is somehow a continuous limit, close to the discrete zeroth-power.

 Seismological reflectivity sequence and seismic wavelet recovery with sparse blind deconvolution

[Abstract]
The $\ell_1/ell_2$ ratio regularization function has shown good performance for retrieving sparse signals in a number of recent works, in the context of blind deconvolution. Indeed, it benefits from a scale invariance property much desirable in the blind context.However, the $\ell_1/ell_2$ function raises some difficulties when solving the nonconvex and nonsmooth minimization problems resulting from the use of such regularization penalties in current restoration methods.In this paper, we propose a new penalty based on a smooth approximation to the $\ell_1/ell_2$ function. In addition, we develop a proximal-based algorithm to solve variational problems involving this function and we derive theoretical convergence results. We demonstrate the effectiveness of our method through a comparison with a recent alternating optimization strategy dealing with the exact $\ell_1/ell_2$ term, on an application to seismic data blind deconvolution.