One thing I really like about large-n empirical papers is their ability to run robustness checks. Statistical models only produce the ideal results if the author captures the correct data generating process. For example, suppose a researcher theorizes that states with larger economies have higher chances of winning wars. If the size of economies and regime type are the only things that matter for winning a war, then you would want your model of war winning to just include economic size and regime type.

However, someone might object under the belief that industrial capacity matters as well. It might be difficult to know who is correct, but fortunately there is an easy solution: just run both models. If the size of the economy is positively correlated with winning wars in both cases, then the objection is irrelevant, the scholars can agree to disagree, and we can go back on focusing on the economy.

So despite only focusing on one model, empirical researchers usually include a few robustness checks within a paper and often include a much larger online appendix with yet more robustness checks. This should be applauded, as it gives us more confidence that the result is correct.

Yet formal theorists often fail to make such robustness checks–even though the problem is the same one empiricists face. Indeed, formal theorists tend to give us the result of *one* model. But that model is essentially a knife-edge case of a greater family of models, one in which the order of moves is reversed, players have additional moves, and information operates differently. And just like the empirical problem, it is very difficult for formal theorists to “know” that their version of the model is the correct game that real world actors play. Why, then, should we privilege the single version that the author presents? This is an especially important question given that authors have incentive to present the model with the most interesting results even if those results completely disappear if the author tweaks the assumptions slightly. (Note, again, that this is the empiricists have the same incentives.)

The answer, of course, is that we should not. We should expect formal theorists to think long and hard about the models they create and whether they actually represent a broad, robust finding.

For example, consider my work on nuclear nonproliferation agreements. I show that potential nuclear powers are always willing to accept nonproliferation settlements. That is an extremely broad and strong claim. And, dangerously, it comes from a *very* simple bargaining game. Why should you trust my results?

If that is all I presented to you, you probably should not. Observers of American/Iranian negotiations know that there are a lot of complexities to this type of bargaining. Consequently, I have received a large number of questions about how the results would change if I tweaked certain assumptions. I collected most of these in the back of my brain in case someone were to ask me those questions again.

But then my presentation at the annual Peace Science Society conference approached. For those unaware, Peace Science tends to slant very empirical. So as I was thinking about how to present the paper, I began considering how a strict empiricist would present the paper. She would probably start with a bit of theory, show the results of the main model, demonstrate robustness (or at least say “this is robust to…”), and then perhaps talk about a couple of cases if time permitted.

Theorists and empiricists may be different in many ways, but I think it is telling that the previous sentence could apply equally to both a theoretical and an empirical presentation. Yet theorists almost always completely skip the robustness step. This time, I did not. So instead of saying “bargaining works in the model I constructed,” I said “bargaining works in the model I constructed *and* in models with the following different assumptions.” I then showed a slide with the following:

- Prior investment in nuclear research
- Prestige
- Punishment for reneging
- Negative externalities
- Non-binary power shifts
- Nondeterministic proliferation
- Sanctions
- Bargaining over objects that influence future bargaining power
- Non-common discount factors
- Imperfect monitoring

This is much, *much*, **much** stronger than just saying “hey, bargaining works.” I think it won over a few people in the crowd, and I received many comments after the presentation about how it was a nice touch.

All of this is to say that we really should be making robustness checks of our formal models both in our papers and in our presentations. Why isn’t this commonplace already? There are two restricting factors. First, solving alternative model specifications is a time consuming task. An empiricist can just add a few robustness check variables, press a button, and be done. (This assumes that such data already exist. If not, they are in trouble.) Theorists often have to re-solve the entire model, which can take days.

Second, it is space consuming. I mentioned ten robustness checks above. The paper takes ten and a half pages addressing all of them. I can get away with this because I am writing a book; I would be in deep trouble if this were a journal article and I only had 10,000 words to work with.

Still, I don’t think either of these are particularly good excuses. Regarding time: yes, spending time doing these things is annoying. But it is an investment in getting your result right, and you should be willing to pay it. Moreover, if the central result you are finding is decent, then the logic should intuitively carry over in many of the cases. For example, Maya Sen and I have a working paper on judicial nominations that shows under certain conditions Senators randomly reject nominees despite not having any good reason to do so. We use a very simple model, yet (much to our surprise) the results immediately carry over to much more complicated setups, and we can explain why without having to do any more math.

Regarding space: this is a poor excuse. Empiricists solve the space problem by creating online appendices. When was the last time you ever saw an online appendix for a formal article that wasn’t just a proof of the model in the paper? There is no reason theorists can’t copy this solution.

Bottom line: empiricists and theorists face similar robustness challenges. We need our models to be robust, and the only way we can effectively communicate that in scholarly work is to conduct robustness checks. Empiricists do a good job here; theorists have a lot of room for improvement.