Thanks to innovations in computer software, in laboratory techniques, and in observational technology, scientists today can measure things on a scale inconceivable only a few years ago. New laboratory and computational methods allow evaluation of vast numbers of hypotheses in order to identify those few that have a reasonable chance of being true, and simple oversights of human judgment can be corrected by computer. In the first part of the three-part series on the subject, Ashoka gives us unique insight into scientific revolutions, in the weekly column, exclusively in Different Truths.
Scientific revolutions are sometimes quiet. Despite a lack of public fanfare, there is mounting evidence that we are in the midst of such a revolution – premised on the automation of scientific discovery made possible by modern computers and new methods of acquiring data.
Consider, for example, the following developments:
- Using data from the 1970s, about eight years ago, a team of data analysts working in Holland predicted that low-level lead exposure is more dangerous to children’s cognitive development than had previously been thought – a prediction confirmed by recent reanalyses of later observations;
- Using measurements of reflected solar energy (technically, the visible-near infrared spectrum), a computer identified minerals in rocks from a California desert lake as accurately as had a team of human experts at the site who had access both to the spectra and to the actual rocks;
- In Antarctica, a robot traversing a field of ice and stones picked out the rare meteorites from among the many rocks;
- Scientists at the Swedish Institute for Space Physics realised that an instrument aboard a satellite was malfunctioning and they recalibrated it from Earth;
- An economist working for the World Food Organisation found that current foreign aid practices have no impact on extreme poverty;
- Climate researchers traced the global increase in vegetation and its causes over the last twenty years;
- A team of biologists and computer scientists reported determinations of the genes in yeast whose function is regulated by any of a hundred regulator genes;
- A kidney transplant surgeon measured the behaviour of rat genes that had been aboard the space shuttle;
- A biologist reported a determination of (possibly) all of the human genes in cells lining the blood vessels that respond to changes in liquid flow across the cells.
All of these developments – and they are simply more or less random examples I happen to know – reflect a new way of learning about the world. Thanks to innovations in computer software, in laboratory techniques, and in observational technology, scientists today can measure things on a scale inconceivable only a few years ago. New laboratory and computational methods allow evaluation of vast numbers of hypotheses in order to identify those few that have a reasonable chance of being true, and simple oversights of human judgment can be corrected by computer. The change is from the textbook scientific paradigm in which one or very few hypotheses are entertained and tested by very few experiments to a framework in which algorithms take in data and use it to search over many hypotheses, as experimental procedures simultaneously establish not one but many relationships. While there are consequences even for small collections of data, the automation of scientific inquiry is chiefly driven by novel abilities to acquire, store, and access previously inconceivable amounts of data, far too much for humans to survey by hand and eye. The methodology has moved in consequence; in a growing number of fields, automated search and data selection methods have become indispensable.
This may not seem revolutionary, but it has all of the earmarks of the scientific revolution that Thomas Kuhn emphasised years ago: novel results, novel kinds of theory, novel problems, intense and often irrational hostility from parts of the scientific community. We can see the revolution at work by looking more closely at three of the examples I mentioned above.
The lead was long a component of paint, and the Mobil Oil Company introduced tetraethyl lead into gasoline in the 1930s. From these and other sources, low-level lead exposure became common in the United States and elsewhere. Large doses of lead and other heavy metals were known to disrupt mental faculties, but the effects of low-level exposure were unknown. Besides, low-level exposure was hard to measure: low-level lead concentrations fluctuate in blood and do not indicate how much lead the body has absorbed over time.
In the 1970s, Herbert Needleman found an ingenious way to measure cumulative lead exposure using the lead concentration in children’s baby teeth. He also measured the children’s IQ scores and many families and social variables that might conceivably be relevant to the children’s cognitive abilities. Reviewing the data by analysis of variance, a standard statistical technique introduced early in the twentieth century, Needleman concluded that lead exposure has a small but robust effect–it lowers children’s IQ scores.
Since a lot of money was at stake, criticism naturally followed, and, in 1983, a scientific review panel formed by Ronald Reagan’s Environmental Protection Agency asked Needleman to reanalyse the data with stepwise regression, a more modern statistical technique. The idea behind this technique is very simple even if the mathematics is not: suppose any of several measured variables might influence IQ scores. But start with the assumption that none of the variables influence IQ. Change that assumption by entertaining as a causal factor whichever variable is most highly correlated with IQ score, then keep adding causal factors by a mathematical measure that takes account of the correlation already explained by previously considered factors. Stop when additional variables don’t explain anything more. (This procedure can also be run in reverse, starting with the assumption that all of the measured variables influence IQ scores, and then throwing out the least explanatory factors, one by one.) Needleman had measured about forty variables that might account for variations in his subjects’ IQ scores, and stepwise regression eliminated all but six of them. Lead exposure remained among the causal factors, and using a standard method (indeed, the oldest method in statistics, originating with Legendre’s essay on comets in 1808) to estimate the dependence of IQ score on lead exposure, Needleman again found a small negative effect.
Many years after the confirmation of Needleman’s results had helped to eliminate lead from gasoline, two economists, Stephen Klepper and Mark Kamlet, reanalysed Needleman’s data –with a difference. Reasonably, they assumed that the measured values Needleman reported were not perfectly accurate: IQ scores did not perfectly measure cognitive ability; lead concentrations in teeth did not perfectly measure lead exposure; and so on. Each of Needleman’s six remaining variables perhaps influenced cognitive ability, but the true values of those variables were not recorded in his data. The data consisted of measurements produced by the true value of each variable for each child, and also by unknown measurement errors. Klepper proved an interesting theorem that implied that for Needleman’s data, with the assumptions about measurement error, the true effect of lead exposure on cognitive ability could be positive or negative or zero. The elimination of lead from gasoline, it seemed, had been based on a statistical mistake. The story doesn’t end here, however. But before continuing, a digression into the statistics of causality is necessary.
[To be continued]
©Ashoka Jahnavi Prasad
Photos from the internet
#MidweekMusing #ComputersToday #ModernComputers #RoleOfComputersToday #RevolutionsOfComputers #DifferentTruths
Latest posts by Prof. Ashoka Jahnavi Prasad (see all)
- Coffee Makers: Smell the Coffee - April 22, 2018
- Self-Love and Thomas Hardy: Torn between the Conflicting Claims of Body and Mind – III - April 18, 2018
- Is Subramanian Swamy the Enoch Powell of Indian Politics! - April 15, 2018