## June 10, 2010

### Information overload - And no more trivia, fool!

[Update 2014/05/20 with Ann Blair publications] There is a recent concern about information overload. Or is there? According to the following independent sources:
the problem is not so recent. Ann Blair already informed us in 2003 that there were Reading Strategies for Coping with Information Overload ca. 1550-1700:
The "multitude of books" was a subject of wonder and anxiety for authors who reflected on the scholarly condition in the sixteenth through the eighteenth centuries. In the preface to his massive project of cataloguing all known books in the Bibliotheca univeralis (1545) Conrad Gesner complained of that "confusing and harmful abundance of books," a problem which he called on kings and princes and the learned to solve.  By 1685 the situation seemed absolutely dire to Adrien Baillet, who warned:
"We have reason to fear that the multitude of books which grows every day in a prodigious fashion will make the following centuries fall into a state as barbarous as that of the centuries that followed the fall of the Roman Empire. Unless we try to prevent this danger by separating those books which we must throw out or leave in oblivion from those which one should save and within the latter between what is useful and what is not."
In this way Baillet claimed to have warded off barbarity itself with his collection of judgments on the learned in his nine-volume (and still only half-completed) Jugemens des sçavans
The "information overload" or "scholar big data" is push further in: Too Much to Know. Managing Scholarly Information before the Modern Age (2010):
The flood of information brought to us by advancing technology is often accompanied by a distressing sense of “information overload,” yet this experience is not unique to modern times. In fact, says Ann M. Blair in this intriguing book, the invention of the printing press and the ensuing abundance of books provoked sixteenth- and seventeenth-century European scholars to register complaints very similar to our own. Blair examines methods of information management in ancient and medieval Europe as well as the Islamic world and China, then focuses particular attention on the organization, composition, and reception of Latin reference books in print in early modern Europe. She explores in detail the sophisticated and sometimes idiosyncratic techniques that scholars and readers developed in an era of new technology and exploding information.
Listen to 23' of Clay Shirky at Web 2.0 Expo NY, 19 September 2008, where you learn, along  the movie narration flood, "It's Not Information Overload. It's Filter Failure":

The "multitude of books" was a subject of wonder and anxiety for authors who reflected on the scholarly condition in the sixteenth through the eighteenth centuries. In the preface to his massive project of cataloguing all known books in the Bibliotheca univeralis (1545) Conrad Gesner complained of that "confusing and harmful abundance of books," a problem which he called on kings and princes and the learned to solve.1 By 1685 the situation seemed absolutely dire to Adrien Baillet, who warned [...]

So apparently, the information overload problem is no novelty. Looks like information is riding an exponential wave, as in the standard chart (left), whose derivative is just about an exponential. Reminds me of the following joke: $1$ and $e^x$ sit in an old favorite room of a restaurant. Waiting for food arrival - noontime. Suddenly, $1$ gets terrified and cry at $e^x$: "hide me, hide me, here enters a derivative operator!". Proud and fierce,  $e^x$ hides the constant behind her back, and defies the operator: "i am $e^x$, i don't fear you". "Sure!" the operator replies, "i am $\frac{\partial}{\partial y}$".
As Clay Shirky says, "If you have a problem for a long time, it's not a problem... Maybe it's a fact!" (IMHO probably emphasized by the Internet/media mode of "content creation", more than often a mere duplication (pure redundancy) or basic distorsion (jamming) of pre-existing content, with reduced added value), to fill the media tubes and pipes (forlorn media ovation). Since more and more people write, blog, tweet and buzz about IO, further adding low valued load. IO might just be neither a true problem nor a false one. René Thom (in Paraboles et Catastrophes, Champs Flammarion, p. 127) reminds us that "Ce qui limite le vrai n’est pas le faux, mais l’insignifiant", approximately translated to "What limits truth, it is not forgery but trifle/insignificance" (quote courtesy of Olivier Rey, whose Itinéraire de l'égarement deserves close reading, admiration of no lover). IO as an inane vomit flood roar (sounds like a death metal song title, but only an anagram).

Yet still assuming that "more data = better decisions", some argue that the "real problem is the lack of efficient strategies to index, summarize, filter, cross-reference and archive information", or propose "A Framework for Information Overload Research in Organizations Insights from Organization Science, Accounting, Marketing, MIS, and Related Disciplines" (Eppler, Mengis, 2003). But more insignificant data may as well lead to zero decisions, as gaussian disturbances may vanish as the square root on the number of observations. Second thought, not so much with rounding, see Statistical Analysis for Rounding Data (Zhidong Bai, 2006). The current trend in signal or image processing is generally similar: acquire more data, at higher frequency, with more precision (watch out, formation on evil road), hoping signal processing, statistics and data mining will cope with the flood and deliver precious information. An extreme example arises in seismic processing, where petabytes of data ("but also storage systems that can handle petabytes of data daily") are gathered. Yet, due to the computational burden and memory footprint, the relative time spent on "fine processing" with respect to data reading, loading, handling, sorting seems tiny.

Thank to the availability of low cost sensors and band-width, the data overload plague is spreading. More and more data, less and less time to process it properly, massive low-quality batch filtering are favored. Signal and image processing enter the dark area of weak signals and information overlook. I pray everyday (no variation of me, Lord) for my colleagues to tell me: next step, we are going to acquire much less signals (and favor no more dilation of disk space), to spend the remaining time on their processing.

1. Laurent,

I think I mentioned that to you a while back. For instance in the oil business, the question seems to be "how can I get more data ?" while the real question is "is there some oil there ? and how much will it cost me to extract it ?"

This line of questioning seems trivial on its own, by that I mean that most engineers will tell you that they want more data to find oil so that they are answering the second question by answering the first, but I sometimes wonder if asking the right question does not lead you to a different trade-off. A little like Clay, if getting petabytes is your everyday problem, maybe your filter has broken somewhere down the line for structural reasons.

Cheers,

Igor.

2. I remember we already discussed that topic (and we probably should discuss again). There is a large huge gap to fill between "how can I get more data ?" and "is there some oil there ?". The "more data" answer is probably related to a "hammer-like" line of reasoning.