What is the point of replication?

No experiment can ever be replicated so each attempted replication must assume that the things which differ don’t matter. The more and the more important the things that we can plausibly assume don’t matter, the stronger is the original study. Chemistry students have done the same experiments for hundreds of years and that’s useful because we can plausibly assume that who and when the experiment is conducted doesn’t matter. The recent brouhaha between Nosek et al. and Gilbert et al. illustrates a weaker case.

In their critique of Nosek et al., Gilbert et al. say that some of their replications failed because things were different.

An original study that asked Israelis to imagine the consequences of military service was replicated by asking Americans to imagine the consequences of a honeymoon

Now that sounds like two very different studies but Nosek provides important context. The original study in question wasn’t about military service or honeymoons it was about the conditions that promote reconciliation between victims and injurers after an injustice has been committed. The original study asked Israelis what they would do and how they would feel about a specific injustice. Namely you and a co-worker have been working on a project for a long-time but just before submission you are called away for reserve duty [male]/maternity leave [female]. Your co-worker takes credit for all the work and gets promoted while you later get demoted. The study then went on to ask questions about the conditions necessary for reconciliation. The reserve duty/maternity leave bit was just the story element needed to explain the situation not the focus of the study.

Nosek et al. tried to replicate the study in the United States where being called up for reserve duty is less common than in Israel and where being demoted for taking maternity leave could raise legal issues so they substituted ‘had to leave for honeymoon’. Everything else was the same. One of the original authors approved the new design.

Nosek et al. were not able to replicate the original findings. Is this because they didn’t replicate the study or because the study failed to replicate? Gilbert et al. say Nosek et al. failed to replicate the study.

In my view, Gilbert et al. are caught on the horns of a dilemma. If the studies don’t replicate they aren’t interesting and if the studies replicate but only under extremely precise conditions they also aren’t interesting. We are interested in general features of the human condition not in descriptions of the choices that 75 female and 19 male Israeli students made at a particular point in time. Moreover, if changes in wording matter then surely so does the fact that the original study was on Israeli’s in 2008 and the replication used Americans in 2013 (a lot has changed over these years!) and so must also a hundred other differences. But if so, what’s the point?

Hat tip: Andrew Gelman who has more to say.


Strangely enough, I am a chemist.

I am team Gilbert on this. When you test something similar, that is a new, though related, experiment. When you get a divergent result that can't be called failed replication, it should be called revealed complexity.

If it is revealed for instance that humans make different decisions about barbecue sauce and retirement funds that shouldn't be too surprising. It should not even be surprising if barbecue sauce and cola trigger different decision networks.

I am not a chemist, not that it matters, and not that real science is at all relevant to the politicking called social science research. Here is the basic logic being presented the social science studies that validate your leftist politics: X1 = Y -> Any X = Y. I really hope that you can see that this logical statement is false. Will your tribalism demand that you deny basic logic?

Poe's Law?

Only a crazy right-winger would suggest that xsub1=Y -> allx=Y. In reality, when social science extrapolates tiny results from tiny populations to all people, it is absolutely correct science, so long as it suggests that right wingers are pathological. John makes an excellent point.

Pro tip: don't act pathological

You attribute superhuman irrationality to your ideological opposites. More likely, they are just motivated by different concerns/values and don't really care if you show them something that demonstrates the costs within YOUR ideological framework. On the Ideological Turing Test, you get somewhere in the range of F minus minus minus.

The term "Replication Crisis" conflates two different things:

-- Social science findings that never really existed

-- Valid social science findings that vary over time and/or space

It can be hard to tell the difference.


"When you test something similar, that is a new, though related, experiment. -When you get a divergent result that can’t be called failed replication, it should be called revealed complexity. "

Sure, but quite often these exact studies are used to form a generalization. Here is a comment from the gender stereotype study that Tyler linked to: "How have gender stereotypes changed in the last 30 years? ... Applying these findings to politics and the 2016 presidential campaign in particular, the researchers also recommended that voters be vigilant about the influence of gender stereotypes on their decisions. "


I will readily agree that improper generalization is a thing as well.

That gender stereotype paper is a pretty funny example of the Replication Crisis' mirror image: the Repetition Crisis.

"Revealed complexity" sounds great to me, but the original study clearly purports to relate to humans in general, not just Israelis.
I looked it over, and it reads as though it is only incidental that the participants were Israeli.

Michael Clemens has a nice paper on the meaning of "replication." http://www.cgdev.org/sites/default/files/CGD-Working-Paper-399-Clemens-Meaning-Failed-Replications.pdf

And this:

"An influential psychological theory, borne out in hundreds of experiments, may have just been debunked. How can so many scientists have been so wrong?"


This is a foundational study from with scores of follow up studies. But it appears there's very little evidence of any significant effect.

It's really appears to me that Social science is mostly junk science and the researchers just reach the conclusion that they want to reach.

"This is a foundational study with scores of follow up studies."

Can we get a Preview button or an Edit button?

Different country, different year, different question. And someone had the nerve to call this "replication"? I mean, it can still be valuable for building a literature in an area of enquiry, but this does not remotely pass the sniff test. "Replication" would be same country, likely a different year, and the same question, in which case a failure to replicate the findings would lead you to immediately look for which social changes over time might have caused the failure to replicate findings.

It depends on what you're trying to replicate. If you want to replicate the results of a specific experiment, then yes, this isn't replication.

However, a lot of psychology experiments try to discover "universal" effects that apply beyond the limited circumstances of a specific experiment. (See all the literature on willpower as a muscle that can become exhausted.) Alex's example is a valid (but imperfect) attempt to replicate the claimed universal effects. As Alex says, it's really the claimed universal effects that are interesting, so it's really the claimed universal effects that need to be validated.

IMHO, many attempts to extract universal effects from specific experiments are totally bogus -- even if the effect is supported by multiple small experiments. In large part, this is due to a pretty large bias where null results (that don't support a claimed universal effect) are less likely to get published than some hot, new claimed universality.

I think the trouble in this case was too many important variables changed. Even if both studies found real effects we don't know what caused those effects.

I think if they had replicated using something like jury duty that would clean up a lot of the ambiguity.

That's a valid point. Still, I think that a lot depends on context that we don't know. It's telling that one of the original authors approved the new design. I.e. on of the original authors thought the effect was so universal that it would apply with all those variables changed.

There's a blurry line between "replication" and "testing the theory", and a lot of that depends on how strongly the theory is believed to be. If the theory was thought to be super general, I say this is a replication. If the theory was thought to be weaker, I say this study just tested its limits.

For example, if I'm trying to replicate the predictions of General Relativity, then I am under no constraints whatsoever to reproduce an experiment that has been done so far. The theory is supposed to be so universal that any test of it counts as replication. However, if I want to replicate measurements of the thermal conductivity of supported, exfoliated graphene at 273 degrees K, then I have a lot less freedom in setting up my experiment; by necessity, my experiment will be very similar to ones that came before it.

+1. Nobody is interested in the specifics of this case, just whether it generalizes to other situations.

In the best social science research, for example, Robert Zajonc's research on the effect of mere exposure, he collected data from many different areas and over the years ran multiple experiments to validate the general theory (i.e. that the more you are exposed to something, the more you tend to like it). Other researchers also provided supporting evidence.

This is a far cry from running an experiment or two and then rushing to publication and press release.

Possibly, but part of the problem might be with the reader. If the reader wants unrealistic universality, that reader might be frustrated with tests showing complexity.

Here's an obituary of Zajonc that engagingly explains his many contributions:


I think Alex's thought process here is spot on, but the specific example may not best illuminate it. Personally I find it very credible that the conditions necessary for reconciliation would be dependent on the perceived culpability of the actors involved.

In both studies the "stayed at work" partner behaves badly. But in the first study the "not at work" partner has been called away in services of a clear "greater good". In the second study the asshole went on vacation and didn't finish his shit before he left.

It's not as clear a line of "perpetrator" and "victim" so even if there are different psychological needs of the two roles, the roles themselves are blurred. People are complex and highly attuned to social context, so a barrier of "applicable in different contexts or not interesting" is a useful one.

Related to this blog, we have Behavioral Economics experiments blowing up in two senses. Experiments are much more common, as are arguments about reproducibility.

FWIW, I think all those experiments are showing complexity, that human decision-making is not simple. Perhaps this is a disturbing result for some at the meta level, if they are looking for simpler, more uniform pursuits of utility, etc.

Perhaps you are right that “perpetrator” and “victim” is a thing that can be generalized, but perhaps not. Perhaps esprit de corps, and putting on a uniform, or being viewed as a mom, changes things.

I think there is a tendency in economics to want to take 2-3 variables and draw a clear (and purportedly causal) statistical relationship. Humans, of course, do not often comply so readily.

The "asshole goes on holiday" line of thinking seems like a pretty big deal, when put that way.

"If the studies don’t replicate they aren’t interesting and if the studies replicate but only under extremely precise conditions they also aren’t interesting."

Exactly. Small alterations in the study can still demonstrate a failed replication when the claims of the original research are overbroad. This issue will split down the left/right dichotomy because the left is the most guilty of performing bad "conclusion in search of data" social science and so the left will show up in these comments to defend all bad science.

That's very funny, Thomas. Re-read what you wrote and see if you can find "conclusion in search of data." Particularly, if it isn't obvious, where you supported that "politics of investigator" are the dividing factor.

Who would have expected a leftist to respond negatively to my hypothesis, after him and another leftist contributed evidence in favor of it. Very funny.

Anyone who disagrees with you is a leftist?

Is that what I said? Why are you so apt to lie?

Weaponized science: The left has been using 'science' to derive pre-determined conclusions in order to support claims that the right is pathological and that researchers (who discriminate against would-be right-leaning researchers) should inform government policy.

Look Thomas, if your ideology tells you that government is an utter failure, counterproductive in anything it does, then probably people who are searching for solutions which are not solved by the free market will not support your research. Because, the findings are even more pre-determined on that side - the answer is invariably "no, it's dumb waste of money and here are some negative side effects I can dream up."

While there is absolutely some high quality "right wing research" out there (research which provides evidence which promotes "right wing policy", you should not find it surprising that academics in a two party system tend to align NOT with the party that promotes creationism as science and engages in AGW denial. They will align with the alternative, no matter what ideological affiliation the party tends to have.

Your blanket statements would be more credible if you could mention some specific examples.

"if the studies replicate but only under extremely precise conditions they also aren’t interesting.”

In such a case, assuredly you have honed in on SOMETHING, and the fact of non-replication in other conditions will help to broaden the story of how this plays out in more realistic conditions.

But then don't we have to ask the same question about the point of numerous, numerous, social science studies even if the are not experimental. Every time we collect social science data, even it is not classicaly experimental (i.e., with an experimental and control group), we are doing so in circumstances that different no matter how many control variables we include in the regression.

And should we stop suggesting macro economic policies for country A based on what happened in Country B.

It seems to me that they failed to replicate the Israelis. Did they conduct their study on American Jews?

Did the original studies conclusion state that it only pertained to Jews?

Israeli Jews tend to be strikingly more brusque than American Jews.

Israelis are more brusque than Americans, full stop. Always da jooz with you folks.

Always the dismissive question dodging with you guys (which is to say hard left fake moderates). Science works by standardizing as many factors as possible. Most American Jews have a lot more in common genetically and culturally with Israeli Jews than would any other group of Americans.

The point is that these researchers (who are overwhelmingly leftists) make claims not supported by the evidence of their studies. Alex's point about the horns of dilemma is largely being ignored by the familiar leftist commenters. What can they say? Either their religion of bad science is rife with useless studies that falsely generalize, and they are revealed for the fools they are for "fucking loving science", or the studies don't replicate.

Thomas - what's your opinion about the "science" of marijuana usage or gun control (woops, can't fund it, obviously nothing to see here ... )?

People who think that the presence of low grade research is a speciality of one side of the political spectrum but not at all the other ... well, come on, you're not dumb ... think about it, it means you yourself are biased. You don't have to give up your values or political views to recognize that the glass is half empty for research which offends your priors and half full for research which affirms your priors.

That having been said, it is entirely possible that credible research can be performed which will be more readily accepted by certain elements across the diversity of the political environment. E.g., gains of free trade and capitalism (inconvenient to Marxists), gains from helping the poor to achieve their potential (inconvenient to Darwinist-minded capitalists). As in most human things, there are a million dead ends or shoddy/suboptimal efforts to break new ground before we actually stumble across something truly stunning.

People who ignore the intense leftist bias in social psychology in order to pretend that this issue is "50/50" across political lines are liars or fools.

"That having been said, it is entirely possible that credible research can be performed"

Not when academics admit to discrimination against research and researchers who don't agree with their politics, and not when research influence coincides with government action on that research ("doing something" versus "not doing something"), and not when academic status and income is political.

Do you want power, prestige, and wealth as a researcher? Deliver results that enable government interventions. Of course, you'll want to deliver those resuts to people who are predisposed to government intervention. Perhaps deliver results that claim that opposition to an expanded welfare state stems from racism? There is an entity that spends 3.8 trillion dollars that would be pleased to hear that research.

The Zionist movement and the Israeli state have worked hard over quite a few generations to socially construct for Israelis a national persona that is less genteel than that of assimilated diaspora Jews.

Thomas - if you're involved in research that lends itself to market solutions, you will leave academics to start your own company or work for a company that can earn significant profits from this research. The problem may indeed be more a matter of self selection than any sort of effort to drive the right wing out of universities. Why would you spend your life fighting for piddling little grants if you had stumbled across something that suggested there were millions or billions of dollars to be made?

No, I don't think it's 50/50 in universities. People who work on issues with free market solutions will quickly leave the academy and go make their millions. People who are not interested in issues conducive to government intervention will go work in the private sector (which, conveniently, usually pays a lot more at high levels of research).

The replication crisis is the most important social sciences story in a decade, and the most important story developing right now in all of science, period. (And I say this as a professional mathematician with full appreciation of the significance of the AlphaGo development.)

Decades' worth of "findings", possibly an entire field, cast into doubt. Should make us think deeply about incentive structures and good old human dumbness. What Gelman calls the "garden of forking paths", and others call "researcher degrees of freedom", is utterly obvious once explained, yet precious few --- even among humans at the top of the intelligence scale --- seem to have noticed the risks.

This term, “researcher degrees of freedom,” is even more useful if we recognize that just as analysts can overfit models that therefore won’t be replicable, they can also underfit by not being allowed adequate intellectual degrees of freedom to offer “controversial” explanations, driving them into endless repetitions of aging mantras about racism and sexism. The issue for Student was that data were expensive while potential explanatory factors were cheap. Today, the mirror image often reigns: Data are readily available, but honest explanatory factors can cost you your job.

Too many researcher degrees of freedom permit trickery; but too few cause stupidity.

Knowing what you're actually getting at, I think it's more reasonable to portray it as "the freedom to explore controversial or unpopular ideas" and not "honest explanatory factors". Anyone who enters their research knowing which "honest explanatory factor" they are looking for is going to face superhuman hurdles in overcoming their biases and engaging in proper self critique and consideration of shortcomings and/or alternative explanations.

In the field you're most likely talking about, Cliff recently shared a link which is, I think, the first study I've seen by someone in the field which illustrated yes, prior views, but also made genuine (I think not perfect) efforts to tackle head on a great number of potential sources of criticism in his work - the end result of which was an estimation that biological factors explained 1% difference, not 15 IQ points average difference.

In lending genuine credibility to many counterarguments, and avoiding giving any particularly notable credibility to any "conclusions" drawn from simplistic statistics techniques, it would be next to impossible to fault such a researcher for his "inconvenient" research. That piece was written by a truly curious mind who wanted to know the truth, but already had an opinion that the "inconvenient" outcome would indeed be found, which should not be damning if the rest of the work is fairly rigorous.

"Anyone who enters their research knowing which “honest explanatory factor” they are looking for is going to face superhuman hurdles in overcoming their biases and engaging in proper self critique and consideration of shortcomings and/or alternative explanations."

Right. So as you and the left in general approach IQ research from the position of not wanting to believe IQ is genetic because that is racist, I'd suggest that you'd find great success in getting government grants to find evidence for your conclusions and you'll definitely get tenure.

To get funding, you need to demonstrate a public interest in your research. What value is IQ research to the public? What would we use the findings for, in the respective cases of finding a) no difference, b) very small difference or c) very large difference?

Promotion of eugenics? Promoting abolition of the social safety net?

In researching intelligence, I think a MUCH stronger case could be made for exploring the benefits and failures of standardized testing, and how standardized testing can be improved to reflect a greater diversity of skill-sets. Why? I make a clear case for the value to the public. A standardized testing method which can reflect a greater diversity of skill-sets, perhaps dropping the 1, 2 or 3 worst areas, will be much better at streaming candidates than one which erects barriers against students who do not satisfy more narrow definitions of "intelligence" - this would help to unleash additional economic potential of a greater number of individuals, by keeping open the doors which are relevant to their specific skill sets.

"We are interested in general features of the human condition not in descriptions of the choices that 75 female and 19 male Israeli students made at a particular point in time."

Both seem interesting to me.

The social sciences currently suffer from both a Replication Crisis and a Repetition Crisis. Much of what is common in general to human beings has already been discovered and/or isn't very interesting.

Differences among human groups are very interesting, but you are only supposed to attribute differences to white racism or male chauvinism or white male racist chauvinism or a handful of other tired, repetitious ideas.


As Pinker remarked (and I see via Google search to recover this quote that you previously highlighted it, as well): "Irony: Replicability Crisis in Psych DOESN'T Apply to IQ: Huge N's, Replicable Results. But People Hate the Message."

Bunch of hate-fact noticing shitlords, these Steve(n)s.

This is really what this is about. When science delivers answers that support the dogma, skeptics are called 'deniers' (of the faith), when science delivers answers that don't support the dogma, researchers are fired.

Perhaps because they are pushing counter-dogma as evidence with flawed research? Since you mention "deniers", I imagine you might have some interesting anti-AGW links to share (if so, most likely published by scientists whose research specialization is not in fact in climate science).

I'm a "denier" because I don't believe the solution to AGW is to spend 10 trillion dollars on Democrat party sycophants, global wealth redistribution, and climate researchers. You're an IQ denier because you'd rather believe correlation = causation social "science" that gives your ideology political power. There's something of a difference there, wouldn't you agree?

No one is proposing to spend $10 trillion on anything. Having given up on explaining the not-very-complicated-but-economically-superior cap-and trade, carbon taxes are the most commonly proposed solution, and can easily be made revenue neutral. Is it worse for the economy to tax ingenuity and effort (corporate and personal income tax) or wasteful consumption of a finite resource? Oh, but there exist some people who don't want it to be revenue neutral, and prefer investments in social programs to increase the future productive capacity of the population, THEREFORE any carbon tax must be viewed as a conspiracy to trick us into communism (that's the running alt-right theory on carbon taxes, ya?).

If your concern is corruption, then let's talk about corruption. In particular, the elections financing, campaign financing and lobbying rules which make it virtually certain to happen.

And, on IQ ... wow man, you're the one engaging in correlation=causation thinking. Are you really arguing sincerely? You seem to have very angry ideas about "leftist research" only supporting their priors, yet out of hand dismiss anything you don't like for the fact that you don't like it, and easily accept anything that confirms your priors. Are you aware of how incredibly strong your own biases are, or the level of your own hypocrisy in running around accusing people of things you yourself do? It seems not ...

I recently wrote about Nosek's latest paper, which he entitled: "An Unintentional, Robust, and Replicable Pro-Black Bias in Social Judgment."


I don't think this replication was good enough. The original study was about factors that an individual cannot control (at least not after abortion is no longer an option), but leaving on honeymoon is indeed a choice a person makes. You're not going to jail for refusing military service or be failing to care for an infant if you're not going on a honeymoon. You could always postpone it, no?

A few other links that might have interesting elements to this post:

This is a great post and is exactly what I thought when I read the response. If you are saying that they failed to replicate a study because they used different subjects than the first study, then all you have really shown is that the effect the study finds holds only when applied to a specific group of subjects. If you can't generalize your findings, then you haven't really told us anything about human behavior.

But how Israeli behavior differs from, say, Japanese behavior is also an interesting subject.

For example, I'd love to see a study of how Israelis and Japanese differ on average in notifying you of something embarrassing in public, like you coming out of the bathroom having a piece of toilet paper stuck to the bottom of your shoe.

In general, knowledge doesn't exist without contrasts. You can reduce data down to 1s and 0s, but not down to just 1s. Knowledge about human universals can only exist in contrast to either human variables or non-human universals.

Within the "Replication Crisis," there are two general things going on.

Some social science findings can't be replicated because they never existed in the first place and were only conjured up due to p-hacking, the garden of forking paths, publication bias and the like.

Other social science findings don't replicate because people sometimes differ over time and/or space. Thus the social sciences fall in between hard sciences like chemistry and marketing research.

Marketing research is a legitimate trade (I was in the business for many years) and employs many scientific methods. But the central distinction between science and marketing research is that the latter doesn’t strive to discover permanent truths: That, say, Bill Cosby was good at advertising Jell-O Pudding Pops in 1982 was good enough for marketing research in 1982. (If you want to know whether to hire Cosby in 2016, marketing researchers would be happy to take your money to run a test.)

"Other social science findings don’t replicate because people sometimes differ over time and/or space. "

Isn't that just an excuse? If you can't replicate broadly then your hypothesis is wrong. To say it was right at a given time, in a given spot with a given group is pretty useless scientific information.

But it can be pretty useful marketing research.

Or as political propaganda.

To give a trivial example, one researcher could go watch caribou for a month, come back and say "they just mill around a lot." Another could seek to confirm that, and say "no, they migrate!"

It would be silly to say we learned nothing. And while disagreeing, they both might be right.

Why should we assume humans are easier to understand than caribou?

Take the example of people called up for military service. If you repeated the study over time, you may find that the reaction depends on external circumstances. If Israel is at war, would the results change?

If a scenario is tested in different places at different times with different results, intuitively it isn't surprising. People respond differently.

I would suggest that the conclusion is not replicable. The data is data, it tells something. If what it tells you is different every time you run the survey, it tells you that fact. What conclusion to draw is where the chalkenge lies, and what further research direction.

The problems of replication and other problems in the social sciences are precisely why Richard Feynman called them "Cargo Cult Sciences". They have the trappings of science without the discipline that is the actual backbone of real science.

The Social Sciences should be renamed Social Philosophy. That includes macroeconomics. I mean, we're talking about a 'science' in which the most dominant theories ebb and flow with the political winds, which has a predictive track record no better than throwing darts at a dartboard, and which is divided into factions that completely contradict each other, even about the fundamentals of how an economy works.

Whatever that is, it's not science no matter how many mathematical tools you use. Sometimes I think these fields are nothing more than exercises in motivated reasoning carried out by smart people who have figured out how to use 'studies' to bolster an essentially unfalsifiable, unprovable set of beliefs.

Personally, I've been a big fan of the social sciences since I was 13 in 1972.

- You just have to read them very closely.

- You can't expect the social sciences to make accurate forecasts that are both exciting and true. When read closely, the social sciences can help you make forecasts -- e.g., students will have higher test scores in Palo Alto than in in East Palo Alto -- that are true but boring and depressing.

Disagreement is a sign of a healthy scientific environment. You don`t have to have discovered to truth, yet, to be doing "science". A failure to replicate does not mean that science is not being done, just that the area of research has not achieved consensus. In social sciences, you have the added complication that you're not investigating things which are determined as a function of eternal truths about physics and chemistry, you are investigating a moving target.

Imagine an effort to prove gravity, if the size of gravitational effects also varied with the weather, the political environment and people's moods. It would be much harder, but that would not make it "not science". That having been said, there is clearly "research" on a great diversity of social issues which cannot really be considered as science, for the fact of systematically ignoring weaknesses which conveniently help to achieve the desired result.

Do you have an example of a falsifiable theory in macroeconomics? Or in sociology? Or political science? Or any other of the 'soft' sciences? You say we haven't discovered the truth 'yet'. Just when might that be, and what might that truth look like? What are you even looking FOR?

Can you describe an objective test for a popular economic theory that, if passed, would result in widespread agreement among economists despite their political beliefs or other prejudices?

My contention is that you cannot, because these are not real sciences. They are collections of observations about opaque complex systems that when looked at through the lens of motivated reasoning allow people to build up chains of argument that support the beliefs they already hold, or at least the beliefs that are most comfortable to hold. They are not attempts to find objective truth, because in a complex system there is no objective truth - just behaviors that mutate and change as the system adapts. Being able to predict the future state of such a system after an input is applied is like trying to predict how many rabbits there will be in Yellowstone five years after you introduce a hundred wolves, or trying to predict the size and shape of an anthill a year in the future. It can't be done because these systems are ever-changing and driven by stochastic processes and high sensitivity to initial conditions.

There is an almost exact analogy between trying to manipulate an economy by tinkering with aggregate values and trying to bring steel birds back to your island by building runways out of palm fronds and radio towers out of bamboo - both are attempts to manipulate a system by modifying the few emergent properties you can observe, despite the fact that those properties tell you next to nothing about what's actually going on inside the system.

Good questions.

Falsifiable in macro: reducing interest rates increases investment. Increasing fiscal spending positively impacts aggregate demand and can reduce the extent of fluctuation in a business cycle.

In sociology I plead ignorance. I skimmed an intro text one time, and no essentially nothing about the field as a matter of general practice. I think one falsifiable statement that might come out is something like "if you treat people like garbage, they will not turn out so well."

In political science, while there is absolutely a strongly empirical wing, a lot of stuff is essentially in the form of case studies, where you learn about many similar case studies, the differences in events and contexts, and try to draw some broader lessons which MIGHT apply to other similar contexts. Political science does not often purport to have predictive power, rather, it strives to group together categories of cases and historical knowledge in a way that can provide us with intellectual ammunition to, at the very least, "not do stupid shit". In places where political science DOES purport to have predictive power, such as relating to elections outcomes, it is worth highlighting that this branch is interesting to journalists, but is essentially inconsequential to the vast majority of what is studied in the field.

I think an important distinction needs to be made in recognizing the differences between cases where a single falsification is fatal and cases where a falsification just leads us to look deeper.

In my studies in political science, in several classes we were openly encouraged to doubt political science as a "science" in any traditional meaning of the word, and profs were not shy to accept that defenses of it as "science" were full of holes. However, due to complexity of social systems, the failure to go from observation and hypothesis to unfalsified theory need not imply that at least SOME are not at least TRYING to respect certain principles of the scientific method. It would be better if more social sciences encouraged such open self critique.

"Falsifiable in macro: reducing interest rates increases investment. Increasing fiscal spending positively impacts aggregate demand and can reduce the extent of fluctuation in a business cycle."

So if these are falsifiable theories, are you saying that if you found a case where reducing interest rates didn't increase investment, you would have falsified the concept? My guess is that what macroeconomists would actually say is that this case was special, or that there's something going on they don't understand, and we just need more studies to figure out the complete picture.

In your cases above, I can construct a pretty easy hypothetical for when those relationships might not hold true. For example, it might be the case that if the problem with the economy was related to easy credit or to a bad mismatch between short-term and long-term investment, lowering interest rates might just make the problem worse. Or lower interest rates might signal to the people that the economy is in trouble and cause them to retrench further.

Likewise, I can imagine a fiscal 'stimulus' constructed in a way that it pulls workers and capital from existing projects into less efficient projects, hurting the economy as a whole. This could happen even with slack demand if the stimulus was targeted at regions or industries where local demand isn't so bad. Or, if the chief problem in the economy is high taxes or high debt, borrowing money to spend as stimulus might just spook the economy further, or even cause a financial panic.

I'm not saying these relationships are likely, or even true. My point is that in a complex system you can always come up with a hypothetical for why your prediction didn't come true -this time-. Post-hoc reasoning is pretty easy when there are millions of variables and an unlimited number of ways you can slice and dice them. So I don't think that believers in those theories would consider them falsified if their predictions didn't come true - they'd just dig around and find some more correlations that they can use to show why the theory was correct but the specific situation wasn't normal. And it seems that economists of all persuasions have been doing that for a long time.

For that matter, most economists still cling to equilibrium theories that were cribbed straight from physics textbooks in the 1800's. We know now that these models are incorrect - complexity theory says that an economy isn't in equilibrium - it's an adapting, changing system that can have areas of local meta-stability and temporary equilibria, but those can vanish or appear at any time. Inputs to the economy don't cause it to divert then return to the status quo - they are COMPUTED by the economy, and the original equilibrium may or may not be re-established. And the results may be different each time, as these systems are unpredictable and sensitive to tiny changes in initial conditions (which themselves are always changing) This makes the whole idea of predicting an economy by studying what it did in the past a very suspect activity, because the economy you're dealing with now is different from the one you're inspecting in an unknown number of ways. Aside from the fairly new field of complexity economics, I don't see a lot of attention being paid to this.

In the hard sciences, you don't see this kind of behavior. You don't see scientists lining up along political lines to argue endlessly over whether the Higgs boson exists or not. You don't see a group of Newtonians fighting against Relativists, with the prevailing opinion ebbing and flowing with the political winds or the stature of some member of one group. This doesn't happen because real science is anchored in hard, testable reality. Disputes therefore have objective answers that will satisfy all parties.

My hypothesis is correct. Of all left-wing commenters who responded to this post, 100% of them commented in opposition to the claim that these studies can't be reproduced. If they were instead right-wing commenters, I could probably get this published in a social psychology journal and delivered directly to the DNC and to said left-wing commenters facebook feeds.

?? What would you get published? What's your thesis? What would your methods be? What do you expect to find given existing knowledge?

Also, have you ever studied any natural or social sciences beyond the first year level?

Hi, I don't know how many people are still reading after comment #63 or whatever, but . . .

To those of you who say that the two studies really are different: Yes, I agree they're different. But two points:

1. They're not nearly as different as Gilbert et al. make them out to be. The military service / maternity leave / honeymoon aspect was only a tiny part of the study. Neither one was a study about military service / maternity leave / honeymoon. So in my opinion Gilbert et al. muddied the waters by characterizing the differences as they did.

2. More importantly, there's no good data-based reason to believe the original study anyway! Under a deterministic mode of thinking, it's natural to say that the first experiment succeeded, the second experiment failed, thus the effect works under certain conditions but not others. But actually the only reason for believing the first study is that it has statistical significance; in statistics jargon, "p less than .05." And as Uri Simosohn and many others have discussed in recent years, it's easy-peasy to get p less than .05, just by choosing comparisons in light of your data. Which Simonsohn and his colleagues called p-hacking and researcher degrees of freedom. In my writings on the topic, I've emphasized that these degrees of freedom arise, even if the researchers only perform one test on their particular dataset; this is what Eric Loken and I have called the garden of forking paths.

So, yes, the two studies differ. Any two studies of humans will necessarily differ. The second study has a controlled preregistered design and nothing came of it. The first study had statistically significant p-values but that happens from the garden of forking paths. I don't consider the first study evidence of anything--and I'd still say that even if the second study had never been performed! To me, the replication is fine, but the first paper was never presenting useful evidence in the first place.

Dr. Gelman and I have been kicking around in his blog's comments sections for awhile whether or not non-replicating experiments could sometimes be excused with my idea that, well, maybe it really did work back then, but times (or places or whatever) have changed. He tends to be more cynical about his fellow academics than I am!

But, while he admits the theoretical possibility of my excuse for non-replication being possible, he always seems to defeat me soundly in arguing over specific cases. In the examples of junk science he's chosen over the years, I've yet to come up with one example where my "different times, different places" sympathy is more plausible than his "Nah, there was never anything there" skepticism.

I think it's totally fair to say that in some cases replication fails because the system has changed since the original study was done. But if that's the case, it says something rather profound about our ability to predict the future of that system by studying its past behaviour.

A lot of economics is built on models that are tested by applying sequestered historical data to a model to see if it accurately predicts what happened. And if those models pass, they are then used to extrapolate what the future might hold.

However, the primary flaw there is that even the sequestered data is describing the specific reaction to an input by the same system, so even if the model correctly describes the behaviour of one system, it doesn't mean it will be correct the next time - or even that it will be correct if you could re-run the entire experiment on the same system, since sensitivity to initial conditions might cause it to behave very differently if you could do it all over again.

My Straussian reading of this post is that it is a justification of Alex's continuing membership in the Alice Goffman fanclub.

This post reminds me of a passage from Italo Calvino's "Invisible Cities":

Marco Polo describes a bridge, stone by stone.

"But which is the stone that supports the bridge?" Kublai Khan asks.

"The bridge is not supported by one stone or another," Marco answers, "but by the line of the arch that they form."

Kublai Khan remains silent, reflecting. Then he asks: "Why do you speak to me of the stones? Is is only the arch that matters to me."

Polo answers: "Without the stones there is no arch."

In doing social science, one is ultimately interested in learning something about human nature and the forces that shape the sweep of history. But to do so, one needs to study the particular: "without the stones there is no arch."

Yet, as Kublai says, the stones are not interesting in themselves.

Comments for this post are closed