The AEA's New Data Policy - Marginal REVOLUTION

The AEA has long had a data repository but no one was responsible for examining the data or replicating a paper’s results and confidential data was treated as an exception. All that is about to change. The AEA has hired a Data Editor, Lars Vilhuber. Vilhuber will be responsible for verifying that the author’s code produces the claimed results from the given data. In some cases Vilhuber will even verify results from raw data all the way to table output.

The new data policy is a significant increase in the requirements to publish in an AEA journal. It takes an immense amount of work to document in a replicable way every step of the empirical process. It’s all to the good, of course, but it is remarkable how little economists train our students in these techniques and make no mistake writing code to be replicable from day one is an art and a science and it needs to be part of the econometrics sequence. All hail Gentzkow and Shapiro!

Here’s more information:

On July 10, 2019, the Association adopted an updated Data and Code Availability Policy, which can be found at https://www.aeaweb.org/journals/policies/data-code. The goal of the new policy is to improve the reproducibility and transparency of materials supporting research published in the AEA journals by providing improved guidance on the types of materials required, increased quality control, and more review earlier in the publication process.

What’s new in the policy? Several items of note:

A central role for the AEA Data Editor. The inaugural Data Editor was appointed in January 2018 and will oversee the implementation of the new policy.
The policy now clearly applies to code as well as data and explains how to proceed when data cannot be shared by an author. The Data Editor will regularly ask for the raw data associated with a paper, not just the analysis files, and for all programs that transform raw data into those from which the paper’s results are computed. Replication archives will now be requested prior to acceptance, rather than during the publication process after acceptance, providing more time for the Data Editor to review materials.
Will the Data Editor’s team run authors’ code prior to acceptance? Yes, to the extent that it is feasible. The code will need to produce the reported results, given the data provided. Authors can consult a generic checklist, as well as the template used by the replicating teams.
Will code be run even when the data cannot be posted? This was once an exemption, but the Data Editor will now attempt to conduct a reproducibility check of these materials through a third party who has access to the (confidential or restricted) data. Such checks have already been successfully conducted using the protocol outlined here.