Data mining is not infinitely powerful

To say the least:

Suppose that asset pricing factors are just data mined noise. How much data mining is required to produce the more than 300 factors documented by academics? This short paper shows that, if 10,000 academics generate 1 factor every minute, it takes 15 million years of full-time data mining. This absurd conclusion comes from rigorously pursuing the data mining theory and applying it to data. To fit the fat right tail of published t-stats, a pure data mining model implies that the probability of publishing t-stats < 6.0 is ridiculously small, and thus it takes a ridiculous amount of mining to publish a single t-stat. These results show that the data mining alone cannot explain the zoo of asset pricing factors.

That is from a new paper by Andrew Y. Chen at the Fed.


Comments for this post are closed