A recent article in Science (King et al, 2009), by computer scientists and biologists from Aberystwyth and Cambridge, describes a ‘Robot Scientist’ called Adam which has been designed to study the genetic basis of metabolism in yeast. Adam needs supervision from a technician to keep him topped up with reagents, and to tidy up after him, but otherwise may be left to himself as he formulates and tests hypotheses. Adam knows how to inspect databases on the genetics of yeast. He can select strains from a collection, use them to establish colonies in selected growth media, and measure the rate at which these colonies grow. In the course of the reported trial Adam undertook more than six million measurements of colony growth rates. More importantly, Adam decides which experiments to do, and structures them in a logical way, with replicated comparisons of particular strains designed to assess the importance of particular genes.
The undeniable advantages of Adam as a scientist over his human counterpart is that he can do 6 million measurements on yeast colonies without losing his sanity or suing the department for repetitive strain injury. Furthermore, the logical structure of his experiments, that is to say, the combination of simple logical experimental steps from an ‘ontology’ developed by his human programmers into appropriate protocols, is a useful way to ensure that costly effort in the lab is well-directed. But does Adam deserve to be called a scientist?
The authors admit that this question is not settled by their paper. (In fact one gets the impression that they are pragmatists, perhaps not enormously interested in this general question, but (rightly) proud of the tool that they have developed). Essentially Adam explores a space of possible hypotheses with a set of possible procedures, both of which are tightly circumscribed by the human designers of the system. This is enormously useful, but would seem to exclude the conceptual shifts, the new and creative ways of thinking about problems, which underlie any genuine scientific advance. For these free will is essential.
‘Free will’ is the usual translation of liberum arbitrium, but it is perhaps better rendered as ‘free choice’, or ‘free decision’ (I follow Herbert McCabe’s recently published book on Aquinas here, being no latinist). Without the assurance that a decision that I make as a scientist is free, that is to say nothing compels me to make this particular deduction from my observation, I have no grounds to assume that what I deduce is true. Furthermore, while I might encode a machine to make what appear to be decisions (e.g. to identify molecules from their mass spectra) all it is doing is applying a rule that I have chosen in advance. It is really my decision. If some new phenomenon appears, (e.g. a molecule whose mass is not recorded in the database) the machine will simply get stuck, where the human scientist would be able to speculate and hypothesize and perhaps arrive at an identification of a new molecule. Indeed Adam appears to have run into difficulties due to flaws in the database with which he was provided.
I have further reservations about Adam, but these are more specialised, and are concerned with the statistical inferences that he makes. Adam automatically tests hypotheses by computing their P-values. This has its roots in the work of the statistician R.A. Fisher in the first half of the twentieth century. Consider a case where you wish to test the hypotheses that mice fed the supplemented diet A grow faster than those on diet B. We weigh several mice fed one diet or the other (in accordance with an appropriately designed experiment). All the mice within each treatment vary in the amount of weight that they gain (because biological material is inherently variable). Given this, can I regard the difference in mean weight gain between the treatments as evidence for a real effect of the supplement? In Fisher’s approach we advance a ‘null hypothesis’, here that there is no difference between the diets. In this case any observed difference simply reflects the inherent variation among the mice, which the experimental design allows us to treat as random. We then compute the probability of seeing a difference as large as or larger than the one we observe in our experiment. This is the P-value, and the smaller it is the more confident we can be in rejecting the null hypothesis.
Computing P-values is a technological problem. Modern computers can do it easily, in Fisher’s time it was laborious. Fisher computed tables to obtain P-values for various test statistics. What he had to do was to compute values of these statistics which correspond to particular P-values, such as 0.1, 0.05, 0.01, 0.001. Unfortunately these came to be regarded as ‘thresholds’ for significance, and a mystical import was attached to 0.05 as the value which means you have found something interesting.
This interpretation of statistical hypothesis testing was promoted by Neyman and Pearson, and many statistical textbooks now give the impression that P=0.05 is some magic number that we select as the expected proportion of false hypotheses that will be incorrectly reported as true in the course of a thesis, scientific programme or scientific career (Mistakes, I've made a few, but then again, on average only five percent of the time..). Neyman and Pearson proposed that no probabilistic inference can be made about a particular experiment but rather 'Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them, in following which we ensure that, in the long run of experience, we shall not often be wrong.' (Neyman and Pearson, 1933). This approach can be regarded as part of a broader project to restrict scientific activity to purely deductive tests of hypotheses (Goodman, 1998). It seems to be what Adam does, hypotheses are accepted if the P-value for the corresponding null is less than or equal to 0.05.
Fisher, however, would have none of it. He wrote: 'The concept that the scientific worker can regard himself as an inert item in a vast co-operative concern working according to accepted rules, is encouraged by directing attention away from his duty to form correct scientific conclusions, to summarize them and to communicate them to his scientific colleagues, and by stressing his supposed duty mechanically to make a succession of automatic "decisions"...The idea that this responsibility can be delegated to a giant computer programmed with Decision Functions belongs to a phantasy of circles rather remote from scientific research' (Fisher, 1973). His prescience here, as we contemplate Adam, is rather alarming!
For Fisher science is not an automated method for processing and selecting among random hypotheses, but rather relies on the responsibility of the scientist to form appropriate, plausible and fruitful hypotheses, the evidence for which is then weighed by the P-value. It seems to me that this is something that cannot ever properly be delegated to a robot, the barrier is not merely technological but conceptual.
So all power to the elbows of the team that produced Adam, which will surely be a valuable tool for the development of modern systems biology. But we should resist the idea that Adam is a scientist. The idea that science can be emptied of its human component is one I must reject at the basic philosophical level, long before we get to any issues of ethics.
Fisher, R.A. 1973. Statistical methods and scientific inference. 3rd ed. Macmillan, New York.
Goodman, S.N. 1998. Multiple comparisons explained. American Journal of Epidemiology 147, 807–812.
King, R.D. et al. 2009. The automation of science. Science 324, 85–89.
Neyman J, and Pearson E. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc A. 231, 289–337.
Some of the final reflections on inference are culled from an article I wrote in Pedometron No 24 (http://www.pedometrics.org/pedometron/pedometron24.pdf)
No comments:
Post a Comment