# Results for “why most published research” 51 found

## Why Most Published Research Findings are False (2)

John Ioannidis’s argument that most published research findings are false has been getting some attention in the blogosphere because of a recent article in the WSJ.  In an earlier post I  explained why most published research findings might be false using a simple diagram.

Hat tip and thanks to Steve Novella at Neurologica Blog and Mark H at Denialism both of whom refer to my analysis adding many excellent insights of their own.

## Why Most Published Research Findings are False

Writing in PLoS Medicine, John Ioannidis says:

There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims. However, this should not be surprising. It can be proven that most claimed research findings are false.

Ioannidis presents a Bayesian analysis of the problem which most people will find utterly confusing.  Here’s the idea in a diagram.

Suppose there are 1000 possible hypotheses to be tested.  There are an infinite number of false hypotheses about the world and only a finite number of true hypotheses so we should expect that most hypotheses are false.  Let us assume that of every 1000 hypotheses 200 are true and 800 false.

It is inevitable in a statistical study that some false hypotheses are accepted as true.  In fact, standard statistical practice guarantees that at least 5% of false hypotheses are accepted as true.  Thus, out of the 800 false hypotheses 40 will be accepted as “true,” i.e. statistically significant.

It is also inevitable in a statistical study that we will fail to accept some true hypotheses (Yes, I do know that a proper statistician would say “fail to reject the null when the null is in fact false,” but that is ugly).  It’s hard to say what the probability is of not finding evidence for a true hypothesis because it depends on a variety of factors such as the sample size but let’s say that of every 200 true hypotheses we will correctly identify 120 or 60%.  Putting this together we find that of every 160 (120+40) hypotheses for which there is statistically significant evidence only 120 will in fact be true or a rate of 75% true.

(By the way, the multiplying factors in the diagram are for those who wish to compare with Ioannidis’s notation.)

Ioannidis says most published research findings are false.  This is plausible in his field of medicine where it is easy to imagine that there are more than 800 false hypotheses out of 1000.  In medicine, there is hardly any theory to exclude a hypothesis from being tested.  Want to avoid colon cancer?   Let’s see if an apple a day keeps the doctor away.  No?  What about a serving of bananas? Let’s try vitamin C and don’t forget red wine.  Studies in medicine also have notoriously small sample sizes.  Lots of studies that make the NYTimes involve less than 50 people – that reduces the probability that you will accept a true hypothesis and raises the probability that the typical study is false.

So economics does ok on the main factors in the diagram but there are other effects which also reduce the probability the typical result is true and economics has no advantages on these – see the extension.

Sadly, things get really bad when lots of researchers are chasing the same set of hypotheses.  Indeed, the larger the number of researchers the more likely the average result is to be false!  The easiest way to see this is to note that when we have lots of researchers every true hypothesis will be found to be true but eventually so will every false hypothesis.  Thus, as the number of researchers increases, the probability that a given result is true goes to the probability in the population, in my example 200/1000 or 20 percent.

A meta analysis will go some way to fixing the last problem so the point is not that knowledge declines with the number of researchers but
rather that with lots of researchers every crackpot theory will have at least one scientific study that it can cite in it’s support.

The meta analysis approach, however, will work well only if the results that are published reflect the results that are discovered.  But editors and referees (and authors too) like results which reject the null – i.e. they want to see a theory that is supported not a paper that says we tried this and this and found nothing (which seems like an admission of failure).

Brad DeLong and Kevin Lang wrote a classic paper suggesting that one of the few times that journals will accept a paper that fails
to reject the null is when the evidence against the null is strong (and thus failing to reject the null is considered surprising and
important).  DeLong and Lang show that this can result in a paradox.  Taken on its own, a paper which fails to reject the null provides evidence in favor of the null, i.e. against the alternative hypothesis and so should increase the probability that a rational person thinks the null is true.  But when a rational person takes into account the selection effect, the fact that the only time papers which fail to reject the null are published is when the evidence against the null is strong, the publication of a paper failing to reject the null can cause him to increase his belief in the alternative theory!

What can be done about these problems?  (Some cribbed straight from Ioannidis and some my own suggestions.)

1)  In evaluating any study try to take into account the amount of background noise.  That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.

2) Bigger samples are better.  (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).

3) Small effects are to be distrusted.

4) Multiple sources and types of evidence are desirable.

5) Evaluate literatures not individual papers.

6)  Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.

7)  As an editor or referee, don’t reject papers that fail to reject the null.

## Why the New Pollution Literature is Credible

My recent post, Air Pollution Reduces Health and Wealth drew some pushback in the comments, some justified, some not, on whether the results of these studies are not subject to p-hacking, forking gardens and the replication crisis. Sure, of course, some of them are. Andrew Gelman, for example, has some justified doubt about the air filters and classroom study. Nevertheless, I don’t think that skepticism about the general thrust of the results is justified. Why not?

First, go back to my post Why Most Published Research Findings are False and note the list of credibility checks. For example, my rule is trust literatures not papers and the new pollution literature is showing consistent and significant negative effects of pollution on health and wealth. Some might respond that the entire literature is biased for reasons of political correctness or some such and sure, maybe. But then what evidence would be convincing? Is skepticism then justified or merely mood affiliation? And when it comes to action should we regard someone’s prior convictions (how were those formed?) as more accurate then a large, well-published scientific literature?

It’s not just that the literature is large, however, it’s that the literature is consistent in a way that many studies in say social psychology were not. In social psychology, for example, there were many tests of entirely different hypotheses–power posing, priming, stereotype threat–and most of these failed to replicate. But in the pollution literature we have many tests of the same hypotheses. We have, for example, studies showing that pollution reduces the quality of chess moves in high-stakes matches, that it reduces worker productivity in Chinese call-centers, and that it reduces test scores in American and in British schools. Note that these studies are from different researchers studying different times and places using different methods but they are all testing the same hypothesis, namely that pollution reduces cognitive ability. Thus, each of these studies is a kind of replication–like showing price controls led to shortages in many different times and places.

Another feature in favor of the air pollution literature is that the hypothesis that pollution can have negative effects on health and cognition wasn’t invented yesterday along with the test (we came up with a new theory and tested it and guess what, it works!). The Romans, for example, noted the negative effect of air pollution on health. There’s a reason why people with lung disease move to the countryside and always have.

I also noted in Why Most Published Research Findings are False that multiple sources and types of evidence are desirable. The pollution literature satisfies this desideratum. Aside from multiple empirical studies, the pollution hypothesis is also consistent with plausible mechanisms and it is consistent with the empirical and experimental literature on pollution and plants and pollution and animals. See also OpenPhilanthropy’s careful summary.

Moreover, there is a clear dose-response effect–so much so that when it comes to “extreme” pollution few people doubt the hypothesis. Does anyone doubt, for example, that an infant born in Delhi, India–one of the most polluted cities in the world–is more likely to die young than if the same infant grew up (all else equal) in Wellington, New Zealand–one of the least polluted cities in the world?  People accept that “extreme” pollution creates debilitating effects but they take extreme to mean ‘more than what I am used to’. That’s not scientific. In the future, people will think that the levels of pollution we experience today are extreme, just as we wonder how people could put up with London Fog.

What is new about the new pollution literature is more credible methods and bigger data and what the literature shows is that the effects of pollution are larger than we thought at lower levels than we thought. But we should expect to find smaller effects with better methods and bigger data.  (Note that this isn’t guaranteed, there could be positive effects of pollution at lower levels, but it isn’t surprising that what we are seeing so far is negative effects at levels previously considered acceptable.)

Thus, while I have no doubt that some of the papers in the new pollution literature are in error, I also think that the large number of high quality papers from different times and places which are broadly consistent with one another and also consistent with what we know about human physiology and particulate matter and also consistent with the literature on the effects of pollution on animals and plants and also consistent with a dose-response relationship suggest that we take this literature and its conclusion that air pollution has significant negative effects on health and wealth very seriously.

## Why are women so prominent in vaccine development?

Here is my Bloomberg column arguing that they are prominent in vaccine development, excerpt:

Then there is the vaccine from Novovax, which is based in Gaithersburg, Maryland. The Novovax results are not yet published, but early word is that they are very promising. This vaccine also is based on new ideas, using an unusual moth cell system to crank out proteins in a highly innovative manner.

Novovax’s team is led by Nita Patel, an immigrant from Gujarat, India. Her vaccine team is identified as “all-female.” Patel is from a very poor family; her father almost died of tuberculosis when she was 4 years old, and she often had to beg for bus fare.

Immigrants too, and there is much more evidence at the link.  In fact women have been prominent in vaccine research for a long time.  But why vaccines?  What is the best hypothesis here?

## William Nordhaus and why he won the Nobel Prize in economics

These are excellent Nobel Prize selections, Romer for economic growth and Nordhaus for environmental economics.  The two picks are brought together by the emphasis on wealth, the true nature of wealth, and how nations and societies fare at the macro level.  These are two highly relevant picks.  Think of Romer as having outlined the logic behind how ideas leverage productivity into ongoing spurts of growth, as for instance we have seen in Silicon Valley.  Think of Nordhaus as explaining how economic growth interacts with the value of the environment.  Here is their language:

• 2018 Sveriges Riksbank Prize in Economic Sciences is awarded jointly to William D Nordhaus “for integrating climate change into long-run macroeconomic analysis” and Paul M Romer “for integrating technological innovations into long-run macroeconomic analysis”.

Both are Americans, and both have highly innovative but also “within the mainstream” approaches.  So this is a macro prize, but not for cycles, rather for growth and long-term economic prospects.  Here is the Prize committee citation, always well done.

Both candidates were considered heavy favorites to win the Prize, sooner or later, and these selections cannot come as a surprise.  Perhaps it is slightly surprising that they won the Prize together, though the basic logic of such a combination makes good sense.  Here are previous MR mentions of Nordhaus, you can see we have been mentioning him for years in connection with the Prize.

Here is the home page of Nordhaus.  Here is Wikipedia.  Here is scholar.google.com.  Here is Joshua Gans on Nordhaus.

Nordhaus is professor at Yale, and most of all he is known for his work on climate change models, and his connection to various concepts of “green accounting.”  To the best of my knowledge, Nordhaus started working on green accounting in 1972, when he published with James Tobin (also a Laureate) “Is Growth Obsolete?“, which raised the key question of sustainability.  Green accounting attempts to outline how environmental degradation can be measured against economic growth.  This endeavor is not so easy, however, as environmental damage can be hard to measure and furthermore gdp is a “flow” and the environment is (often, not always) best thought of as a “stock.”

Nordhaus developed (with co-authors) the Dynamic Integrated Climate-Economy Model, a pioneering effort to develop a general approach to estimating the costs of climate change.  Subsequent efforts, such as the London IPCC group, have built directly on Nordhaus’s work in this area.  The EPA still uses a variant of this model.  The model was based on earlier work by Nordhaus himself in the 1970s, and he refined it over time in a series of books and articles, culminating in several books in the 1990s.  Here is his well-cited piece, with Mendelsohn and Shaw, on how climate change will affect global agriculture.

Nordhaus also was an early advocate of a carbon tax and furthermore note that his brother Bob wrote part of the Clean Air Act, the part that gave the government the right to regulate hitherto-unmentioned pollutants in the future.  The Obama administration, in its later attempts to regulate climate, cited this provision.

I would say that much of Nordhaus’s work has its impact through being “done,” rather than through being “read.”  Few economists have read through this model, which has computer programs and spreadsheets at its core.  But virtually all economists read about the results of such models and have a general sense of how they work.  The most common criticism of such models, by the way, is simply that their results are highly sensitive to the choice of discount rate.

In recent years, Nordhaus has shifted his emphasis to the risks from climate change, for instance in his book The Climate Casino: Risk, Uncertainty, and Economics for a Growing World.  Marty Weitzman offers a good review, as does Krugman.

Assorted pieces of information on Nordhaus:

Nordhaus was briefly Provost at Yale.  He also ended up being co-author on Paul Samuelson’s famous textbook in economics.

He co-authored a recent paper arguing we are not near the economic singularity; in this area his work intersects with Romer’s quite closely.

Bill Nordhaus, 72, a Yale economist who is seen as a leading contender for a Nobel Prize, came up with the idea of a carbon tax and effectively invented the economics of climate change. Bob, 77, a prominent Washington energy lawyer, wrote an obscure provision in the Clean Air Act of 1970 that is now the legal basis for a landmark climate change regulation, to be unveiled by the White House next month, that could close hundreds of coal-fired power plants and define President Obama’s environmental legacy.

Bob, Bill’s brother, once said: ““Growing up in New Mexico,” he said, “you’re aware of the very fragile ecosystem.””

Perhaps my personal favorite Nordhaus paper is on the returns to innovation.  Don Boudreaux summarized it well:

In a recent NBER working paper – “Schumpeterian Profits in the American Economy: Theory and Measurement” – Yale economist William Nordhaus estimates that innovators capture a mere 2.2% of the total “surplus” from innovation. (The total surplus of innovation is, roughly speaking, the total value to society of innovation above the cost of producing innovations.) Nordhaus’s data are from the post-WWII period.

The smallness of this figure is astounding. If it is anywhere close to being an accurate estimate, the implication is that “society” pays a paltry \$2.20 for every \$100 worth of welfare it enjoys from innovating activities.

There again you will see a complete intersection with the ideas of Romer.  Another splendid and still-underrated paper by Nordhaus is on the economics of light.  Nordhaus argues that gdp figures understate the true extent of growth, and shows that the relative price of bringing light to humans has fallen more rapidly than gdp growth figures alone might indicate.  Check out this diagram.  Here is a BBC summary of what Nordhaus did, in other words rates of price inflation have been lower than we thought and thus rates of real gdp growth higher.

Again, you will see Nordhaus and Romer intersecting on this key idea of economic growth.

Last but not least, Nordhaus was a pioneer on the theory of the political business cycle, namely the idea that politicians deliberately manipulate the economy, using monetary and fiscal policy, so as to boost their chances of reelection.  Dare I suggest that this idea might be making a comeback?

Addendum: From Margaret Collins by email: “I’d like to call your attention to Professor Nordhaus’ longstanding association with the International Institute for Applied Systems Analysis (IIASA), the international science and policy research institution located just outside Vienna.  He worked at IIASA shortly after the institute’s creation in 1972, and his work there is closely bound to the issues the Nobel Committee cites in the award  — he was employed for a year in 1974-75, doing pioneering work on climate as part of IIASA’s Energy Program, and producing a working paper entitled “Can We Control Carbon Dioxide?”.  That was perhaps the first economics treatment of of climate change — and Nordhaus dates his work on climate as having begun there.  He has visited IIASA numerous times in the intervening years, and remains a close collaborator, particularly with Nebojsa Nakicenovic, the Institute’s Deputy Director.”

And, from the comments: “Nordhaus also helped pioneer the use of satellite imagery of night time lights as a tool for measuring economic growth, where we’ve played around with some of the publicly available tools to support various analysis.”

## Is Economic Research Biased by Partisanship?

The Washington Monthly, a magazine of ideas from the liberal-left, has a profile of me and my paper with Nathan Goldschlag, Is regulation to blame for the decline in American entrepreneurship? The profile ups the “libertarian says regulation not responsible for bad thing!” angle. My earlier paper, finding that more guns leads to more suicides, was also given the “even a libertarian says” angle. In both cases, I was treated fairly and well and since I wrote the papers to be read, I am happy for the publicity. But I am uncomfortable with these takes.

After all, I am not surprised that my research is not biased by partisanship. Why should other people be? Should I not be insulted? Moreover, I don’t think that I am special in this regard. I think that most academic research in economics is not biased by partisanship. Thus, while it’s nice to receive plaudits on twitter for honesty and bravery, they are undeserved. This is normal. Normal for me and normal for other economists. The public perception to the contrary likely comes from two failures–a failure to distinguish partisan commentary from academic research and a failure to consider that ideology influences topic more than findings.

Economic commentary in the media often does come from political partisans but that is a completely different role than publishing peer-reviewed research. Papers published in mainstream economics journals have passed a high bar and are much less likely to be infused with partisan bias–this is true even when the research leads to a blog post or op-ed that may be of partisan interest.

An economist’s ideology probably does influence the topics they choose to research. I’ve written on bounty hunters, privateers, and the private provision of public goods, topics surely influenced by my interest in how markets solve problems usually thought solvable only by governments. Choice of topic, however, does not necessarily determine the outcome. In the aforementioned three cases, my research can be read as broadly supportive of private solutions. The topics of dynamism and regulation, firearms and suicides, and private cities in India were probably also influenced by ideology but in these cases the research can be read as somewhat less supportive of private solutions.1 Let the chips fall where they may. I’ve learnt something in both sets of cases. My academic ideology, “a demand to know the truth,” trumps any narrow political ideology.

There’s another problem with praising a “libertarian”, or any researcher with strong beliefs, for honesty when their research conclusions don’t fit narrow priors. It puts their research that does fit narrow priors under a cloud. But only people with strong beliefs are put to this test. No one gets suspicious when a moderate democrat produces lots of research that fits moderate democrat priors. Why not? Do you assume reality is moderate?

I also wonder whether the people lauding me for my honest research–for which I thank them–will draw the correct conclusion. Namely, they should now be more receptive to my work on bounty hunters, privateers, and the private provision of public goods. Fingers crossed.

Let me conclude on a lighter note. There are many reasons why regulation could be costly outside of its effects on dynamism. Thus, for my friends who think that I have gone all-squishy, n.b.:

Not that Tabarrok himself has become a booster for regulation. He doesn’t think much of government’s ability to spark innovation through setting standards; the first thing he did when he last bought a new shower head, he said, was remove its federally mandated flow restrictor.

Addendum 1: I have also written many papers like Would the Borda Count have Avoided the Civil War? and Patent Theory versus Patent Law where the topic was driven out of some non-ideological interest or simply because I had an idea. Publish or perish!

## Direct Instruction: A Half Century of Research Shows Superior Results

What if I told you that there is a method of education which significantly raises achievement, has been shown to work for students of a wide range of abilities, races, and socio-economic levels and has been shown to be superior to other methods of instruction in hundreds of tests? Well, the method is Direct Instruction and I first told you about it in Heroes are Not Replicable. I am reminded of this by the just-published, The Effectiveness of Direct Instruction Curricula: A Meta-Analysis of a Half Century of Research which, based on an analysis of 328 studies using 413 study designs examining outcomes in reading, math, language, other academic subjects, and affective measures (such as self-esteem), concludes:

…Our results support earlier reviews of the DI effectiveness literature. The estimated effects were consistently positive. Most estimates would be considered medium to large using the criteria generally used in the psychological literature and substantially larger than the criterion of .25 typically used in education research (Tallmadge, 1977). Using the criteria recently suggested by Lipsey et al. (2012), 6 of the 10 baseline estimates and 8 of the 10 adjusted estimates in the reduced models would be considered huge. All but one of the remaining six estimates would be considered large. Only 1 of the 20 estimates, although positive, might be seen as educationally insignificant.

…The strong positive results were similar across the 50 years of data; in articles, dissertations, and gray literature; across different types of research designs, assessments, outcome measures, and methods of calculating effects; across different types of samples and locales, student poverty status, race-ethnicity, at-risk status, and grade; across subjects and programs; after the intervention ceased; with researchers or teachers delivering the intervention; with experimental or usual comparison programs; and when other analytic methods, a broader sample, or other control variables were used.

It is very unusual to see an educational method successfully replicate across such a long period of time and across so many different margins.

Direct Instruction was pioneered by Siegfried Engelmann in the 1960s and is a scientific approach to teaching. First, a skill such as reading or subtraction is broken down into simple components, then a method to teach that component is developed and tested in lab and field. The method must be explicitly codified and when used must be free of vagueness so students are reliably led to the correct interpretation. Materials, methods and scripts are then produced for teachers to follow very closely. Students are ability not age-grouped and no student advances before mastery. The lessons are fast-paced and feedback and assessment are quick. You can get an idea of how it works in the classroom in this Thales Academy promotional video. Here is a math lesson on counting. It looks odd but it works.

Even though Direct Instruction has been shown to work in hundreds of tests it is not widely used. It’s almost as if education is not about educating.

Some people object that DI is like mass-production. This is a feature not a bug. Mass-production is one of the few ways yet discovered to produce quality on a mass scale. Any method will probably work if a heroic teacher puts in enough blood, sweat and tears but those methods don’t scale. DI scales when used by mortals which is why it consistently beats other methods in large scale tests.

Many teachers don’t like DI when first exposed to it because it requires teacher training and discipline. Teachers are not free to make up their own lesson plans. But why should they be? Lesson plans should be developed by teams of cognitive psychologists, educational researchers and other experts who test them using randomized controlled trials; not made up by amateurs who are subject to small-sample and confirmation bias. Contrary to the critics, however, DI does leave room for teachers to be creative. Actors also follow a script but some are much better than others. Instructors who use DI enjoy being effective.

Quoting the authors of the meta-analysis:

Many current curriculum recommendations, such as those included within the Common Core, promote student-led and inquiry-based approaches with substantial ambiguity in instructional practices. The strong pattern of results presented in this article, appearing across all subject matters, student populations, settings, and age levels, should, at the least, imply a need for serious examination and reconsideration of these recommendations (see also Engelmann, 2014a; Morgan, Farkas, & Maczuga, 2015; Zhang, 2016). It is clear that students make sense of and interpret the information that they are given—but their learning is enhanced only when the information presented is explicit, logically organized, and clearly sequenced. To do anything less shirks the responsibility of effective instruction.
Hat tip: Robert Pondiscio at Education Next.

## Why did the British surpass China in matters military?

Here is an excerpt from the now published Tonio Andrade book, The Gunpowder Age: China, Military Innovation, and the Rise of the West in World History:

Part of the answer of course has to do with industrialization.  Steamships destroyed warjunks, towed long trains of traditional vessels into position, reconnoitered shallows and narrows, and, equally importantly, decreased communication times, allowing for minute, systematic coordination of the war effort.  Similarly, industrial ironworks made strong, supple metal for muskets and cannons, and steam power was used to bore cannons and mix, crumble, and sort gunpowder.

But industrialization isn’t the only answer.  Many of the innovations that most helped the British weren’t about steam power or the division of labor or mechanized factories.  They stemmed, rather, from the application of seventeenth- and eighteenth-century experimental science to warfare.  During the mid-1700s, new scientific discoveries enabled Europeans to measure the speed of projectiles, understand the effects of wind resistance, model trajectories, make better and more consistent gunpowder, develop deadly airborne missiles, and master the use of explosive shells.  These innovations as much as the use of steamship and industrial manufacturing techniques underlay the British edge in the Opium War.

Here is my previous coverage of the book.

For all the talk about recent advances in economics, you don’t hear much about one of the very biggest: how rapidly researchers are filling in the contours of Chinese economic history.

## Why is there no Milton Friedman today?

You will find this question discussed in a symposium at Econ Journal Watch, co-sponsored by the Mercatus Center.  Contributors include Richard Epstein, David R. Henderson, Richard Posner, Daniel Houser, James K. Galbraith, Sam Peltzman, and Robert Solow, among other notables.  My own contribution you will find here, I start with these points:

If I approach this question from a more general angle of cultural history, I find the diminution of superstars in particular areas not very surprising. As early as the 18th century, David Hume (1742, 135-137) and other writers in the Scottish tradition suggested that, in a given field, the presence of superstars eventually would diminish (Cowen 1998, 75-76). New creators would do tweaks at the margin, but once the fundamental contributions have been made superstars decline in their relative luster.

In the world of popular music I find that no creators in the last twenty-five years have attained the iconic status of the Beatles, the Rolling Stones, Bob Dylan, or Michael Jackson. At the same time, it is quite plausible to believe there are as many or more good songs on the radio today as back then. American artists seem to have peaked in enduring iconic value with Andy Warhol and Jasper Johns and Roy Lichtenstein, mostly dating from the 1960s. In technical economics, I see a peak with Paul Samuelson and Kenneth Arrow and some of the core developments in game theory. Since then there are fewer iconic figures being generated in this area of research, even though there are plenty of accomplished papers being published.

The claim is not that progress stops, but rather its most visible and most iconic manifestations in particular individuals seem to have peak periods followed by declines in any such manifestation.

## Why did Cuba become healthier during the economic meltdown of the 1990s?

One should interpret anything about Cuba, or coming out of Cuban data, with extreme caution.  Nonetheless I thought this was interesting enough to pass along:

The economic meltdown should logically have been a public health disaster. But a new study conducted jointly by university researchers in Spain, Cuba, and the U.S. and published in the latest issue of BMJ says that the health of Cubans actually improved dramatically during the years of austerity. These surprising findings are based on nationwide statistics from the Cuban Ministry of Public Health, together with surveys conducted with about 6,000 participants in the city of Cienfuegos, on the southern coast of Cuba, between 1991 and 2011. The data showed that, during the period of the economic crisis, deaths from cardiovascular disease and adult-onset type 2 diabetes fell by a third and a half, respectively. Strokes declined more modestly, and overall mortality rates went down.

This “abrupt downward trend” in illness does not appear to be because of Cuba’s barefoot doctors and vaunted public health system, which is rated amongst the best in Latin America. The researchers say that it has more to do with simple weight loss. Cubans, who were walking and bicycling more after their public transportation system collapsed, and eating less (energy intake plunged from about 3,000 calories per day to anywhere between 1,400 and 2,400, and protein consumption dropped by 40 percent). They lost an average of 12 pounds.

It wasn’t only the amount of food that Cubans ate that changed, but also what they ate. They became virtual vegans overnight, as meat and dairy products all but vanished from the marketplace. People were forced to depend on what they could grow, catch, and pick for themselves– including lots of high-fiber fresh produce, and fruits, added to the increasingly hard-to-come-by staples of beans, corn, and rice. Moreover, with petroleum and petroleum-based agro-chemicals unavailable, Cuba “went green,” becoming the first nation to successfully experiment on a large scale with low-input sustainable agriculture techniques. Farmers returned to the machetes and oxen-drawn plows of their ancestors, and hundreds of urban community gardens (the latest rage in America’s cities) flourished.

And this:

During the special period, expensive habits like smoking and most likely also alcohol consumption were reduced, albeit briefly. This enforced fitness regime lasted only until the Cuban economy began to recover in the second half of the 1990s. At that point, physical activity levels began to fall off, and calorie intake surged. Eventually people in Cuba were eating even more than they had before the crash. The researchers report that “by 2011, the Cuban population has regained enough weight to almost triple the obesity rates of 1995.”

That is by Richard Schiffman, the full article is here, and for the pointer I thank Jim Oliver.

## Tale of a published article

From Joshua Gans:

But people are wrong on the Internet all of the time. So what really annoyed me was how Cowen ended the post:

“This counterintuitive conclusion is one reason why we have economic models.”

…Now where did all that lead? Frustrated by the blog debate, I decided to write a proper academic paper. That took a little time. In the review process, the reviewers had great suggestions and the work expanded. To follow through on them I had a student, Vivienne Groves, help work on some extensions and she did such a great job she became a co-author on the paper. The paper was accepted and today was published in the Journal of Economics and Management Strategy; almost 5 years after Tyler Cowen’s initial post. And the conclusion: everyone in the blog debate was a little right but also wrong in the blanket conclusions (including myself).

The rest of the story is here.  Stephen Williamson serves up a different attitude, and here is Paul Krugman’s very good post on open science and the internet.

## Most Popular Marginal Revolution Posts from 2010

Here are the most popular Marginal Revolution posts from 2010 as measured by landing pages and page views.

1. Book lists were very popular as a category. The highest ranked post in terms of page views was Tyler's Books which have influenced me the most which created a blogosphere avalanche. Links to other people's lists (of influential books) was also very popular. As was Books of the year, 2010 and peculiarly this post on The best-selling book of all time.

2. The number one linked post was What happened to M. Night Shyamalan? a one-liner and one-picturer.  Also very popular in the category of "quickies" were Barbados v. Grenada, the demand for own-goalsDead BirdsFreak-onomics, Nazi-Nudging, and Yuck_markets in Everything.

3. One Game Machine per Child on the failure of a computer voucher program to raise grades (but it did increase gaming).

10. Why Did the Soviet Union Fall? (from 2007).

Other substantive posts with high popularity (in the top-50) were my posts Insiders, Outsiders and Unemployment and The Philosophical Cow and Tyler's posts How many children should you have?Is there a case for a vat?Does the Law Professor have cause to complain? and Why is Haiti so Poor?

Two of my posts from previous years were also popular in 2010, from 2008 What is New Trade Theory? on Paul Krugman's Nobel and from 2005 Why Most Published Research Findings are False.

Hope you have enjoyed this years offerings. What have I missed?

## Why Botswana?

From 1965 to 1995, Botswana was the fastest growing country in the world. During this 30 year stretch, Botswana’s average rate of growth was 7.7% per year. Relative to other nations, Botswana rose from the third poorest nation in 1965 to an “Upper Middle Income” nation.

Of course the rest of Africa has not nearly done so well. The account of Acemoglu, Johnson, and Robinson, later published in Dana Rodrik’s In Search of Prosperity: Analytic Narratives on Economic Growth, suggests the following (summary taken from Beaulier):

1. Botswana possessed relatively inclusive pre-colonial institutions, placing constraints on political elites.

2. The effect of British colonialism on Botswana was minimal, and did not destroy inclusive pre-colonial institutions.

3. Following independence, maintaining and strengthening the institution of private property was in the economic interests of the elite.

4. Botswana is rich in diamonds. This resource wealth created enough rents that no group wanted to challenge the status quo at the expense of “rocking the boat.”

5. Botswana’s success was reinforced by a number of critical decisions made by
the post-independence leaders, particularly Presidents Khama and Masire.

Scott Beaulier, a graduate student at GMU, attempts to amend this view. He argues that British colonial policy was not so beneficient toward market institutions and rule of law. Most of all, “Botswana’s success was the result of good post-colonial policy choices.”

In other words, countries are not trapped by their past. I don’t know enough history to judge this research, but I do know that topics such as Botswana, or Mauritius (another success story), are underexplored by economists.

Addendum: Abiola Lapite refers me to his interesting blog post, he suggests that the relative ethnic homogeneity of Botswana is a critical factor.

## The Danger of Reusing Natural Experiments

I recently wrote a post, Short Selling Reduces Crashes about a paper which used an unusual random experiment by the SEC, Regulation SHO (which temporarily lifted short-sale constraints for randomly designated stocks), as a natural experiment. A correspondent writes to ask whether I was aware that Regulation SHO has been used by more than fifty other studies to test a variety of hypotheses. I was not! The problem is obvious. If the same experiment is used multiple times we should be imposing multiple hypothesis standards to avoid the green jelly bean problem, otherwise known as the false positive problem. Heath, Ringgenberg, Samadi and Werner make this point and test for false positives in the extant literature:

Natural experiments have become an important tool for identifying the causal relationships between variables. While the use of natural experiments has increased the credibility of empirical economics in many dimensions (Angrist & Pischke, 2010), we show that the repeated reuse of a natural experiment significantly increases the number of false discoveries. As a result, the reuse of natural experiments, without correcting for multiple testing, is undermining the credibility of empirical research.

.. To demonstrate the practical importance of the issues we raise, we examine two extensively studied real-world examples: business combination laws and Regulation SHO. Combined, these two natural experiments have been used in well over 100 different academic studies. We re-evaluate 46 outcome variables that were found to be significantly affected by these experiments, using common data frequency and observation window. Our analysis suggests that many of the existing findings in these studies may be false positives.

There is a second more subtle problem. If more than one of the effects are real it calls into question the exclusion restriction.To identify the effect of X on Y1 we need to assume that X influences Y1 along only one path. But if X also influences Y2 that suggests that there might be multiple paths from X to Y1. Morck and Young made this point many years ago, likening the reuse of the same instrumental variables to a tragedy of the commons.

Solving these problems is made especially difficult because they are collective action problems with a time dimension. A referee that sees a paper throw the dice multiple times may demand multiple hypothesis and exclusion test corrections. But if the problem is that there are many papers each running a single test, the burden on the referee to know the literature is much larger. Moreover, do we give the first and second papers a pass and only demand multiple hypothesis corrections for the 100th paper? That seems odd, although in practice it is what happens as more original papers can get published with weaker methods (collider bias!).

As I wrote in Why Most Published Research Findings are False we need to address these problems with a variety of approaches:

1) In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection [this is one reason why theory is important it strengthens selection, AT] which goes into choosing hypotheses the more likely it is that you are looking at noise.

2) Bigger samples are better. (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).

3) Small effects are to be distrusted.

4) Multiple sources and types of evidence are desirable.

5) Evaluate literatures not individual papers.

6) Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.

7) As an editor or referee, don’t reject papers that fail to reject the null.

## Interpreting Statistical Evidence

Betsey Stevenson & Justin Wolfers offer six principles to separate lies from statistics:

1. Focus on how robust a finding is, meaning that different ways of looking at the evidence point to the same conclusion.

In Why Most Published Research Findings are False I offered a slightly different version of the same idea

Evaluate literatures not individual papers.

SWs second principle:

2. Data mavens often make a big deal of their results being statistically significant, which is a statement that it’s unlikely their findings simply reflect chance. Don’t confuse this with something actually mattering. With huge data sets, almost everything is statistically significant. On the flip side, tests of statistical significance sometimes tell us that the evidence is weak, rather than that an effect is nonexistent.

That’s correct but there is another point worth making. Tests of statistical significance are all conditional on the estimated model being the correct model. Results that should happen only 5% of the time by chance can happen much more often once we take into account model uncertainty not just parameter uncertainty.

3. Be wary of scholars using high-powered statistical techniques as a bludgeon to silence critics who are not specialists. If the author can’t explain what they’re doing in terms you can understand, then you shouldn’t be convinced.

I am mostly in agreement but SW and I are partial to natural experiments and similar methods which generally can be explained to the lay public while other econometricians (say of the Heckman school) do work that is much more difficult to follow without significant background and while being wary I also wouldn’t reject that kind of work out of hand.

4.  Don’t fall into the trap of thinking about an empirical finding as “right” or “wrong.” At best, data provide an imperfect guide. Evidence should always shift your thinking on an issue; the question is how far.

Yes, be Bayesian. See Bryan Caplan’s post on the Card-Krueger minimum wage study for a nice example.

5. Don’t mistake correlation for causation.

Does anyone still do this? I know the answer is yes.  I often find, however, that the opposite problem is more common among relatively sophisticated readers–they know that correlation isn’t causation but they don’t always appreciate that economists know this and have developed sophisticated approaches to disentangling the two. Most of the effort in a typical empirical paper in economics is spent on this issue.

6. Always ask “so what?” …The “so what” question is about moving beyond the internal validity of a finding to asking about its external usefulness.

Good advice although I also run across the opposite problem frequently, thinking that a study done in 2001 doesn’t tell us anything about 2013, for example.

Here, from my earlier post, are my rules for evaluating statistical studies:

1)  In evaluating any study try to take into account the amount of background noise.  That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.

2) Bigger samples are better.  (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).

3) Small effects are to be distrusted.

4) Multiple sources and types of evidence are desirable.

5) Evaluate literatures not individual papers.

6)  Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.

7)  As an editor or referee, don’t reject papers that fail to reject the null.