Posts

Showing posts from 2015

BRANE Cut: Biologically-Related Apriori Network Enhancement with Graph cuts

Image
[ BRANE Cut featured on RNA-Seq blog ][ Omic tools ][ bioRxiv preprint ][ PubMed/Biomed Central ][ BRANE Cut code ][ BRANE Omics ] Gene regulatory networks are somehow difficult to infer. This first work from an on-going work on BRANE Omics (termed BRANE *, for B iologically R elated A priori N etwok E nhancement) introduces an optimization method (based on Graph cuts, borrowed from computer vision/image processing) to infer graphs based on biologically-related a priori (including sparsity). It is succesfully tested on DREAM challenge data and an Escherichia coli network, with a specific work to derive optimization parameters from gene network cardinality properties. And it is quite fast. BRANE Cut: Biologically-Related Apriori Network Enhancement with Graph cuts for Gene Regulatory Network inference ( doi , BRANE Cut webpage , preprint ) Background : Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulat

Big data, fishes and cooking: fourteen shades of "V"

Image
[At this short post, you can access the 14 "V" often glued to Bug Data, including vacuity] To Lao Tzu is often attributed (I cannot access the original meaning): Govern a great nation as you would cook a small fish. Do not overdo it. Today's wisdom could be: Deal with Big data as you would process a small signal. Do not over-expect from it, do not over-fit it, do not-overinterpret it. Luckily, Big data does not exist , where Making The Most Of Small Data is advocated. This is a bit like teenage sex: “Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” In the What exactly is Big Data StackExchange question, I have listed all 14 "V" that could describe big data, including... vacuity . They are: Validity, Value, Variability/Variance, Variety, Velocity, Veracity/Veraciousness, Viability, Virtuality, Visualization, Volatility,

Hugo Steinhaus, or K-means clustering in French

Image
Kernel clustering [Modern transcription of the Hugo Steinhaus paper in 1956 (in French) , at the source of k-means clustering algorithms, published first in a french-written post ] Data clustering or clustering analysis belongs to statistical data analysis methods. It aims at forming groups of objects that are similar in some way. Those groups are named clusters. The word cluster is related to clot , for thick mass of coagulated liquid or of material stuck together The whole set of objects contains heterogeneous data, that ought to be grouped into subsets possessing a greater inner homogeneity. Such methods rely on similarity criteria or proximity measures. They are related to classification, machine learning, segmentation, pattern recognition, and have applications ranging from image processing to bioinformatics. One of the most popular clustering method is known as K-means ( k-moyennes in French). with a variation called dynamic clustering (beautifully called nuée

Hugo Steinhaus : classification par k-moyennes, nuées dynamiques

Image
Partitionnement à noyau [Mise à disposition de l' article de Hugo Steinhaus de 1956 , à l'origine de l'algorithme de partitionnement par les k-moyennes ( available in English )] Le partitionnement des données ( data clustering ou clustering analysis ) est une méthode "statistique" d'analyse de données visant à regrouper, dans un ensemble de données hétérogènes, des sous-ensembles de ces données en amas ou paquets plus homogènes. Chaque sous-ensemble doit ainsi présenter des caractéristiques similaires, quantifiée par des critères de similarité ou différentes mesures de proximité. Ces techniques appartiennent aux familles de classification, d'apprentissage automatique ou de segmentation, employées dans un nombre phénoménal d'applications, du traitement d'image à la bio-informatique. L'une des méthodes de partitionnement ou d’agrégation les plus populaires est celle des k-moyennes (ou K-means ), un problème d'optimisation com

Sparse seismic data restoration: a PhD defense

Image
Smoothed $\ell_1/\ell_2$ function for a sparse $\ell_0$ surrogate Mai Quyen PHAM has defended her PhD thesis on July 15th, 2015 at 10.00 am , on the topic of " Seismic wave field restoration using sparse representations and quantitative analysis ” (manuscript in pdf), at Université Paris-Est, bâtiment Copernic, amphithéâtre Maurice Gross, 5 boulevard Descartes (RER A, Noisy-Champs), 77420 Champs-sur-Marne.  Its focus is twofold: 1) sparse adaptive filtering with approximate templates in redundant and geometric wavelet frames (akin to echo cancellation in speech), 2) sparse blind deconvolution for parsimonious reflectivity signals with l1/l2 norm ratio penalty This work has notably been published in two journal papers * Euclid in a Taxicab: Sparse Blind Deconvolution with Smoothed l_1/l_2 Audrey Repetti, Mai Quyen-Pham, Laurent Duval, Émilie Chouzenoux, Jean-Christophe Pesquet IEEE Signal Processing Letters, May 2015, Volume 22, Number 5, pages 539-543. http://

Facebook FAIR(ies) in Paris

Image
Paris is buzzing about the announcement of the new european research center of Facebook in Paris . This was already known around April/May. Six  "fairies" are supposed to have joined Facebook FAIR , or Facebook Artificial Intelligence (AI) Research center. They will do some magical data science (or dédoménologie ), under the guidance of Yann LeCun . He was the host of the day " Data Science and Massive Data Analysis " on the campus of ESIEE Paris, Ecole des Ponts ParisTech and Université Paris-Est Marne-la-Vallée (Paris at large) on June 12th 2014. It's not eerie, it's ESIEE. After Menlo Park and New York, this center, the third and the first outside the US, has been attracted to City of lights .  Where they will bring their TORCH for our enlightenment. Attracted by  moths around a flame, by the local talents and excellent education facilities in artificial intelligence and computer science, and probably substantial financial incentives.They are al

Dédoménologie : la science du traitement de données (signal, images, etc.)

Image
[Où l'on propose le néologisme dédoménologie pour désigner la technique, la pratique, la science du traitement de signal et de l'analyse d'images, au cœur du domaine naissant de la science des données , en passant par Euclide] Ce sont les mots qui existent, ce qui n'a pas de nom n'existe pas. Le mot lumière existe, la lumière n'existe pas. ( Francis Picabia , ou Francis-Marie Martinez de Picabia, Écrits) Quelle analyste d'image, quel traiteur de signal n'a jamais eu des difficultés à décrire son métier ? Pas en détail bien sûr : En fait, je m'intéresse aux propriétés cyclostationnaires des coefficients d' ondelettes de mouvements browniens fractionnaires dans les images de textures. Enfin quand je dis cyclostationnaire, il faut entendre périodiquement corrélé, hein, je ne parle pas des processus presque périodiques. Le traitement du signal, des images ou des données requiert très souvent des périphrases. Des exemples parlants :

Let data (science) speak

Image
The Doctoral College at IFPEN (IFP Energies nouvelles) organizes seminars for PhD students . The next one on 30 March 2015 is about Data Science: "Faire parler les mesures, de la capture (acquisition) aux premiers mots (apprentissage) : la science des données, une discipline émergente" or " Let data speak: from its capture (acquisition) to first words (learning): data science, an emerging discipline ".  The invitees are Igor Carron (Nuit Blanche), Laurent Daudet (Institut Langevin) and Stéphane Mallat (École normale supérieure). Abstracts and slides follow, with two musical interludes: Un trait danger , deux traits sécurité, Jeanne Cherhal Tu dois finir ta thèse , Simon Berjeaut ( paroles , version France Culture, Grantanfi ) For those who could not attend, or for a second shot in video: Laurent Daudet La vérité si je m'embrouille, ou comment l'aléatoire nous aide à mesurer , Journées Scientifiques annuelles de l'Institut Universi