The Battle over Junk DNA - Marginal REVOLUTION

Last year the ENCODE Consortium, a big-data project involving 440 scientists from 32 laboratories around the world, announced with great fanfare that 80% of the human genome was functional or as the NYTimes put it less accurately but more memorably “at least 80 percent of this DNA is active and needed.” What the NYTimes didn’t say was that this claim was highly controversial to the point of implausibility. A fascinating, sharply-worded critique, On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE has recently been published by Graur et al. Here is some of the flavor:

This absurd conclusion was reached through various means, chiefly (1)
by employing the seldom used “causal role” definition of biological function and
then applying it inconsistently to different biochemical properties, (2) by committing
a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate
the crucial difference between “junk DNA” and “garbage DNA,” (4) by using
analytical methods that yield biased errors and inflate estimates of functionality, (5)
by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical
significance rather than the magnitude of the effect. Here, we detail the many logical
and methodological transgressions involved in assigning functionality to almost
every nucleotide in the human genome. The ENCODE results were predicted by one
of its authors to necessitate the rewriting of textbooks. We agree, many textbooks
dealing with marketing, mass-media hype, and public relations may well have to be
rewritten.

Graur et al. make a number of key points. ENCODE essentially defined “functional” as sometimes involved in a specified set of biochemical reactions (I am simplifying!). But every microbiological system is stochastic and sooner or later everything is involved in some kind of biochemical reaction even if that reaction goes nowhere and does nothing. In contrast, Graur et al. argue that “biological sense can only be derived from evolutionary context” which in this case means that functional is defined as actively protected by selection.

There are a variety of ways of identifying whether a sequence is actively protected by selection. One method, for example, looks for sequences that are highly similar (conserved) across species. Once evolution has hit on the recipe for hemoglobin, for example, it doesn’t want that recipe messed with–thus the chimp and human DNA that blueprints hemoglobin is very similar and not that different from that of dogs. In other areas of the genome, however, even closely related species have different sequences which suggests that that portion of the DNA isn’t being selected for, it’s randomly mutating because there is no value to its conservation. When functional is defined using selection, most human DNA does not look functional.

It’s also interesting to note that the size of the genome varies significantly across species but in ways that appear to have little to do with complexity. The human genome, for example, is about 3GB, quite a bit more than the fruit fly at 170MB, But the onion is 17GB! (One of the reasons that onions are used in a lot of science labs, by the way, is that the onion genome is so big it makes onion nuclei large enough so that you can easily see them with low-powered microscopes.) Now one could argue that this lack of correlation between genome size and complexity is simply a result of an anthropocentric definition of complexity. Maybe an onion really is complex. Two points counter this view, however. First, similar species can have very different genome sizes. Second, we know why the genome of some species is really large. It’s filled with transposons, so-called jumping genes, sometimes analogized to viruses, that cut and paste themselves into the genome. Some of these transposons, like Alu in humans, are short sequences that repeat themselves millions of times. It’s very difficult to believe that these boring repetitions are functional (n.b. this is not to say that they don’t have an effect.) The fact that it’s this kind of repetitious, not-conserved DNA that accounts for a large fraction of the differences in genome size is highly suggestive of non-functionality.

Why discuss such an esoteric (for economists) paper on a (nominally!) economics blog? The Graur et al. paper is highly readable, even for non-experts. It’s even funny, although I laughed somewhat sheepishly since some of the comments are unnecessarily harsh. Many of the critiques, such as the confusion between statistical and substantive significance, arise in economics and many other fields. The Graur paper also makes some points which are going to be important in economics. For example they write:

“High-throughput genomics and the centralization of science funding have enabled Big Science to generate “high-impact false positives” by the truckload…”

Exactly right. Big data is coming to economics but data is not knowledge and big data is not wisdom.

Finally, the Graur paper tells us something about disputes in economics. Economists are sometimes chided for disagreeing about the importance of such basic questions as the relative role of aggregate demand and aggregate supply but physicists can’t even find most of the universe and microbiologists don’t agree on whether the human genome is 80% functional or 80% junk. Is disagreement a result of knaves and fools? Sometimes, but more often disagreement is just the way the invisible hand of science works.

Hat tip: Monique van Hoek.