Market Anomalies Fail to Replicate

It’s now well known that many findings in social psychology fail to replicate. Social psychologists have often discovered noise rather than fundamental aspects of behavior. A new paper suggests that many market anomalies also fail to replicate. Hou, Xue and Zhang write:

The anomalies literature is infested with widespread p-hacking. We replicate the entire anomalies literature in finance and accounting by compiling a largest-to-date data library that contains 447 anomaly variables. With microcaps alleviated via New York Stock Exchange breakpoints and value-weighted returns, 286 anomalies (64%) including 95 out of 102 liquidity variables (93%) are insignificant at the conventional 5% level. Imposing the cutoff t-value of three raises the number of insignificance to 380 (85%). Even for the 161 significant anomalies, their magnitudes are often much lower than originally reported. Out of the 161, the q-factor model leaves 115 alphas insignificant (150 with t < 3). In all, capital markets are more efficient than previously recognized.


"In all, capital markets are more efficient than previously recognized." Of course, capital markets can be efficient and full of excesses. I suppose a problem arises if central banks and governments don't allow markets to correct the excesses. Would Tabarrok allow markets to correct the excesses if the correction were very painful? Would you?

I must be missing something in the definition of 'anomaly' - isn't it by definition something that fails to replicate?

Which is the reason the term exists, one would have assumed. Whether one prefers Webster's first or second definition -

'1 : something different, abnormal, peculiar, or not easily classified : something anomalous They regarded the test results as an anomaly.

2 : deviation from the common rule : irregularity'

"I must be missing something in the definition of ‘anomaly’"

Yes, you are

In financial markets research anomalies are deviations from what one would expect if markets were efficient. That's the sense in which it is a "deviation from common rule" or "irregularity".

Call it an alleged anomaly unmasked as likely no anomaly at all.

"I must be missing something ..."

well, Alex does seem rather evasive in wording on the topic.

These "market anomalies" are subjective labels/assertions made by humans (often economists). The abstraction of a "market" ain't making these non-replicable conclusions -- it's specific flesh & blood people.
"Market Anomalies {don't} Fail to Replicate" -- faulty research/conclusions fail-to-replicate.

The statement that " The anomalies literature is infested with..." again attempts to distance the bad effect from its human cause. If "literature" is infested -- the people who wrote that stuff are the real problem. Who are they?

Pointing specifically to "Social Psychology" as a symbol of bad research... also seems an attempt to deflect criticism from the Economics profession, which is also loaded with faulty research and analysis.

I wouldn't say Alex is being evasive. Anomaly is the term used in the finance literature.

Although I think it's obvious that new data should reduce anomalies because of data snooping, the framing of this paper says a lot more about statistical groupthink than the anomalies literature. P value=.049 means the result is celebrated and P value =.051 means there is nothing to see here. "Not statistically significant" is not an affirmative argument against something.

Cross-read with:

Revenge of the Humans: How Discretionary Managers Can Crush Systematics

One bit:

Picture the normal funnel to becoming a PM running a $500M long/short equity book. You grew up in a wealthy family in a wealthy town, usually in the New York metropolitan area, parts of Silicon Valley, Chicago or Michigan. You went to Harvard, Yale or Princeton. You took an IB analyst position at Goldman or another bulge bracket. You spent a few years there learning how to build a financial model before a hedge fund picked you up for an analyst spot. You made friends with your PM, who if you were lucky did well, and 5 years later when the firm had more capital than it knew what to do with, your PM told the firm to give you $200M to play with.

At no point in this process did you ever have to exhibit a lick of skill for the job that you’ve just been given. Yes, you are probably a very smart individual, and you worked hard, but we all know that smart does not equal good in the investment world. Every step along the way you were selected not for the trait which would make you the best qualified to do that job, you were selected because you jumped through the hoops which lead to the correct selection bias. The sad truth is that hedge funds are run by white dudes who grew up in Greenwich, and they like (and trust) working with white dudes who grew up in Greenwich and look like them.

How could that produce anomalies, am I right?

Put it this way. We know the indexers are always doing exactly average. If the grennwich active managers are losing, who do you think they are losing to?

The Greenwich dudes may not be losing in terms of average gross earnings. Their clients are losing, on average, after paying 2+20.
The larger point Leigh makes is quite sound: Hedge Fund selection is based mostly on attributes that have little to do with portfolio management skill.

Anyone that has ever seen the actual performance of an investment strategy deviate from a backtest understands the limits of empirical research and statistical inference. These types of empirical studies of historical or population data --- whether in financial, social science, medical, nutrition science or climate science research --- are nothing like controlled laboratory experiments. (Random control studies like those in medical research would not be included in the list.) We should stop treating empirical studies in these fields as analagous to laboratory experiments in the hard sciences simply because both involve analyzing numerical data.

Right on. I'm constantly amazed by academics discussing anomalies and inefficiencies without any idea what they are talking about. There's nothing better than finding an anomaly in data and then watching it fall apart the moment you try to actually capture it. Noise rules everywhere. Information is extremely hard to get.

This is an important paper, but there are two things that make the results slightly less stark than they appear. First, this replication of the studies largely comes from an unusual treatment of microcap stocks -- not unusual because it's wrong, but unusual because the literature has not largely used it. The authors motivate this change, but a perhaps slightly less sensational characterization of the result would be that a proper characterization of microcaps makes a lot of the results in the literature go away. Researchers can hardly be accused of "p-hacking" or following Gelman and Loken call the "garden of forking paths" if they used the standardized way of testing for anomalies in the literature, ie equal weighted portfolios. I find the authors' replications utterly convincing, but their treatment of the incentive to produce false results somewhat less so.
Second, the replications are all conducted on a standardized database from 1967-2014. Some of the studies were published in the earlier parts of this -- many in the '90s. Even if these anomalies existed, publication of them may have caused arbitrage which eliminated them. Consequently, results over the entire period will be attenuated by including a period in which the anomaly doesn't exist, even though it actually did exist at the time the study was written. Again, this to some extent undermines their meta-story of p-hacking.

I am not sure how The Great Quant Makeover, describe at my link above fits in, but I think it has something to do with past anomalies and the changes that did arbitrage and eliminate them. Perhaps leading to new anomalies etc.

Interesting piece. It certainly fits in nicely with my second point above. It also suggest that the paper's authors' hypothesis that lots of the searching for strategies that work use better techniques than the ones they are discussing.

Wouldn't we expect various financial market anomalies to disappear shortly after the publication of reports of their existence?

Probably... but there are a number of issues. If the anomaly is smaller than the transaction costs to exploit it, then no. If the risks of exploiting it are too large, then no. The piece anonymous linked to above suggests that lots of anomalies were tested systematically and found to be either spurious or arbitrageable only at too high a price.

Except, the unpublished studies, aka industry research, shows exactly the same pattern. Talk to any hedge fund managers (I'm one) how well (especially) purely statistical backtests work in actual application.

I'm all for replication, but are there really 447 Financial anomalies? Fama-French has 3 factors and even the newest models are only like 4 or 5. I'd rather make sure the analysis is right on those than on the other ~400 that no one really cares about.

Also, careful meta-studies of financial anomalies compare the performance from before publication with the performance after publication. In finance when someone uncovers something as profitable, others try to take advantage of it. Fama says there's a value factor? Well, then Dimensional launches a Value mutual fund. This bids up the prices of value stocks and reduces the factor return going forward.

The paper also ignores the fact that a cross-sectional regression can be performed with weighted least squares to put less weight on the microcaps.

I'm certainly no expert, but among quant practitioners there has for years been skepticism towards academic papers uncovering newer factors. They're viewed more as prompts to "try this" with the expectation that it likely won't work in usage. For example, long-short portfolios as typically studied can be difficult and costly to implement in practice. The difficulty in arbitraging away an anomaly may help explain its existence.

It's also important, and challenging, to differentiate between anomaly as a mispricing, in which case it is corrected once discovered (the hedge fund guys constantly read these papers and try to implement them). Or a factor, which continues to explain returns even once discovered, as it is identifying some sort of risk premia associated with increased risk.

A small/large cap factor is going to be, theoretically, loading on increased risk from small firms. Whereas a pricing anomaly may have no reason to exist anymore once discovered and arbed away.

It's likewise not impossible that other social science papers that fail to replicate, such as some priming experiments, might have at one time been valid, but then the effect disappeared due to factors such as boredom or changes in fashion.

Might be true, but is operationally meaningless: how could you ever tell?

Campbell Harvey at Duke has been talking about this for years, and it's obvious to anyone with some common sense and understanding of how the research is done.

And Andrew Gelman of course...

Except, unlike in Gelman studies, there's no way of fixing it. There's no way to increase the power of your tests by increasing sample size. In finance, information, beyond certain window, does not accumulate. You are stuck with small samples and there's no escape.

Interesting argument for using Liquidity as an asset class, for those of you who are interested in this subject beyond the signaling attacking markets for being efficient or not:

The paper's intentionally trying to nuke anomalies by using value-weighted rather than equal-weighted. The original authors intentionally construct the anomalies using equal-weighted portfolios. They're testing against a completely different hypothesis by changing the methodology. This is especially pedantic when considering liquidity-based anomalies. Of course large-cap stocks don't have illiquidity premia. Yet by using value, you're making the vast majority of your illiquidity portfolio mega-cap stocks.

This would be like me proving that Mexican food doesn't taste that good, because Taco Bell's the biggest Mexican restaurant in the United States.

Frankly speaking, I am overly good in Stock trading, I believe it is far better to do Forex trading on currency pairs. I love it through OctaFX because they have low spreads, over 70 instruments, high leverage up to 1.500, zero balance protection and much more, it’s all extremely beneficial and helps me with working. I also feel happy with the 24/5 support that I have, as it is something that keeps me entirely comfortable with working and allows me to enjoy.

Comments for this post are closed