How Data Happened

08 Jan 2023

Book	How Data Happened: A History from the Age of Reason to the Age of Algorithms
Author	Chris Wiggins, Matthew L. Jones
Published	March 21, 2023

John Tukey’s principles of data analysis

Data analysis must seek for scope and usefulness rather than security

Data analysis must be willing to err moderately often in order that inadequate evidence shall more often suggest the right answer

Data analysis must use mathematical argument and mathematical results as bases for judgment rather than as bases for proofs or stamps of validity

Data analysis is detective work

In the atmosphere of Bell Labs, Tukey and his collaborators created a wide variety of statistical and computational tools needed to make data analysis a reality. Sixteen years later, in a practical textbook, he explained, “exploratory data analysis” (EDA) is “detective work-numerical detective work–or counting detective work–or graphical detective work.’ EDA offered some “general understandings” useful across domains of detective work.

Data visualization is incredibly important for effective data analysis

Tukey celebrated the creation of new tools for that craft.

Tukey’s 1978 textbook, whose draft had circulated for years in Bell Labs circles and beyond, offered a survey of the arts of exploring data through potent means of “reexpression.” “We have not,” he explained in bold type, “looked at our results until we have displayed them effectively.”

Effective display means efficiency with many developing profisized, forms of visualizing data, “much more Tukey emphasized output creative effort is needed to pictorialize data analysis.

For humans, the use of appropriate pictures offers the possibility of great flexibility all along the scale from broad summary to fine detail, since pictures can be viewed in so many ways.

The “Big Data” moment

A similar observation–that large data sets gathered for one purpose may yield potential new kinds of scientific and commercial knowledge-would be made in a diversity of computational fields over the coming decades.

Financial data and their practical analysis would give rise to technical analysis, statistical arbitrage, and later, with more computational engineering, the field of high frequency trading. Similarly, computational biology in the 1990s and 2000s exploded with analysis of differing genomes as well as high throughput biological assays for understanding genetic networks, large-scale mining of electronic health records, and clinical informatics.

In industry, applied, computational statistical methods changed the way companies recommended books and movies early in the rise of e-commerce, then later techniques would be applied to wine, shoes, same tech and eventually information and communication.

Each of these fields had its own “data moment” as it discovered anew how large quantities of data, generated for purposes other than learning, could be valuable given a bit of statistical analysis surrounded by an infrastructure need to gather, process, and productize insights from these data. Chambers, Tukey, and others argued that the statistical analysis was a mere part of this project–the mathematical nugget at the core of “greater” statistics. But they were also warning that academic statistics was doomed to irrelevance if it didn’t begin providing the tools for learning from this data.

producing health

How Data Happened

Related posts

The Future of Healthcare is Predictive 06 Apr 2025

Unlocking the Customer Value Chain 28 Jan 2025

The Most Elegant Product Framework You’ve Never Heard Of 04 Dec 2024