Artifact Evaluation: Motivation

Motivation

What is It and Why do We Need It?

For many years, some of us in programming languages and software engineering have been concerned by the insufficient respect paid to the artifacts that back papers. We find it especially ironic that areas that are so centered on software, models, and specifications would not want to evaluate them as part of the paper review process, as well as archive them with the final paper. Not examining artifacts enables everything from mere sloppiness to, in extreme cases, dishonesty. More subtly, it also imposes a subtle penalty on people who take the trouble to vigorously implement and test their ideas.

In 2011, Andreas Zeller, the program chair for ESEC/FSE, decided to institute a committee to address this problem. Andreas asked Carlo Ghezzi and Shriram Krishnamurthi to run this process.

An aside on naming. Shriram had long wanted to create such a committee and call it the “Program Committee” (ha, ha). However, not only is that name taken, we also wanted to be open-minded to all sorts of artifacts that are not programs (not only models but also data sets, etc.). We therefore called this the Artifact Evaluation Committee (AEC). We hoped that someone would come up with a better name for this eventually, but that has yet to happen while this name seems to be increasingly entrenched.

Design Criteria

For several months before the deadline, Carlo and Shriram consulted with several software engineering community leaders about the wisdom of having an AEC. Most responded positively; a few were tepid; a small number were negative and gave constructive feedback. Here are some of the most prominent issues that people raised:

Introducing this new step into the evaluation process for papers might be unfair to authors.
It's expensive to create packaged artifacts, and the value in doing so is not clear.
Exposing artifacts to the general public before extracting enough value for the group that created them is unfair to that group.
Industrial researchers (but not only them) might not be willing to share their artifacts due to various proprietary considerations.
Fear of failing to meet expectations may lead authors to simply not submit artifacts or, worse, skip the conference.

It became clear that there was a strong desire to be conservative in the design of this process, at least initially. They therefore decided that, in addition to treating artifacts with the same confidentiality rules as papers (as we had always intended), artifacts did not need to be made public: it was sufficient if only the AEC saw them. (Obviously, they encouraged authors to upload them to the supplement section of the ACM DL and/or make them public on their own sites.)

They also made two especially important decisions:

They erected a Chinese Wall between the paper and artifact evaluation processes: that is, the outcome of artifact evaluation would have no bearing at all on the paper's decision. The simplest way to assure the public of this was via temporal ordering: that is, artifact submission and evaluation began only after paper decisions had been published. Confident authors could, of course, provide artifact links in their papers, as some already do, but they were not required to do so.
The outcome of artifact evaluation for individual papers would be reported by the authors, who had a choice of suppressing this information in case of a negative review.

These decisions seemed to reassure several people that the process would truly be a conservative extension of current review mechanisms, and would not adversely affect the conference. These design elements have persisted through several runnings of the AEC for numerous conferences, but they should be understood in their historical context—as a means of addressing the desire for a strictly conservative extension to existing processes—than as driven by wanting particular outcomes for artifacts.