Losing the Popular Vote Doesn’t Make Trump Illegitimate. It’s Irrelevant.

As more returns come in from California, it looks like Trump is going to lose the popular vote despite having secured a majority of electoral votes. In the coming days, if the 2000 election was any indication, I suspect we will see Democrats arguing that this somehow makes Clinton the “rightful” president and that Trump wouldn’t be president if we had a “more sensible” electoral system.

These arguments are silly: the popular vote tells us virtually nothing about what an election would have looked like if the popular vote mattered.

The basic idea is that elections are strategic; campaigns adopt particular tactics given the rules of the game. Consequently, we cannot judge whether Clinton would have won in a popular vote contest given the results of an electoral vote contest.

Here’s an analogy to make the idea more concrete. Baseball games are decided by runs. Teams strategize accordingly, sometimes sacrificing outs to get a man across the plate. This occasionally results in games where the winner gets fewer hits than the loser.

If you change the rules of the game, you change the strategic incentives. Award wins based on hits, and suddenly those sacrifice strategies would never happen. As such, we can’t retroactively award wins based on hits for games where the teams were strategizing for runs.

Similarly, if only the popular vote mattered, campaign incentives change. Candidates choose which policies they support based on the pivotal voter in the election. With an electoral vote, this is the median of the median voters of each state. With a popular vote, this is simply the median voter of the country.

Individual level incentives change as well. With an electoral vote, people in California have fewer incentives to go to the polls than someone in Pennsylvania; the result in California is a foregone conclusion, whereas the result in Pennsylvania is in doubt and could sway the electoral college. With a popular vote, each individual’s incentives are identical.

Thus, we don’t know how the election would have turned out under a different electoral system. Given the high concentrations of Latinos in otherwise uncompetitive states (California, Texas), it’s extremely unlikely that Trump would been as ardent in his anti-immigration policy if the popular vote mattered. And that alone means that we can’t use Tuesday’s returns to judge how a popular vote would have played out.

Bottom line: Trump won with the system we are playing with, and that’s all that matters.

Fun with Incentives: Baseball Contracts Edition

Continuing in the long line of “why do people structure these things in such a crazy way” posts, we have the sad story of Phil Hughes. Hughes is a pitcher for the Minnesota Twins. Like many other players, Hughes’ contract has specific benchmarks that reward bonuses. One in particular gives him $500,000 if he pitches 210 innings this year.

209 2/3s innings? Worthless! Who needs someone who pitches 209 2/3s innings?

But 210 innings? Yep! Definitely worth a half million dollars.

You can see where this is going. The Twins were rained out on Friday. He pitched in a double header today. However, this pushes his next start back a day, his start after that by another day, and so forth. Due to some unfortunate timing, this will ultimately mean he will (probably) end up with one fewer start than he should otherwise. Extrapolating a reasonable expectation of number of innings per start, losing this one start will likely mean he will not reach the 210 inning threshold and thus not receive a $500,000 bonus.

For completeness, this post might all be for nothing. If Hughes averages 7 2/3s innings per start for the remainder of the season, he will reach 210 innings and the point will be moot. But it seems doubtful that this will happen for two reasons. First, the Twins have him under contract for two more years; with the team eliminated for the playoffs, it makes little sense to stretch him out when a younger pitcher in greater need of MLB experience could get those innings. Second, if you were the owner of the team and could reasonable limit his innings for the rest of the season, why wouldn’t you save yourself a half million dollars?

So why oh why are contract structured in this way? I don’t have a good answer. It would be exceedingly easy to simply structure contracts so that the incentive pays a pitcher a fixed amount per inning. This ensures that teams will use pitchers for the number of innings that is economically worthwhile and do not face the incentive-twisting discontinuity between 209 2/3s innings and 210 innings.[1] Transaction costs could conceivably force actors to accept these discontinuities, but that does not seem to be a problem here. Instead, agents and players seemingly accept these contractual terms despite the obvious conflicts of interest they create.

[1] To be fair, the contract has something like this built-in. Hughes receives quarter million dollar bonuses for 180 innings and 195 innings. But there is still no good reason to create these discontinuities.

The Game Theory of MPSA Elevators

TL;DR: The historic Palmer House Hilton elevators are terribly slow because of bad strategic design, not mechanical issues or overcrowding.

Midwest Political Science Association’s annual meeting–the largest gathering of political scientists–takes place at the historic* Palmer House Hilton each year. While the venue is nice, the elevator system is horrible. And with gatherings on the first eight floors, the routine gets old really fast.

Interestingly, though, the delays are not the result of an old elevator system or too many political scientists moving at once.** Rather, the problem is shoddy strategic thinking.

Each elevator bay has three walls. The elevators along each wall have different tasks. Here’s the first one:


Elevators on this floor go from the ground floor to the 12th floor.

Here’s the second:


These go from the ground floor to the eighth floor or the 18th floor to the 23rd floor.

And the last wall:


These go from the ground floor to the eighth floor and the 13th floor to the 17th.

Now suppose you are on the ground level want to go to the 7th floor. What’s the fastest way to get there? For most elevator systems, you press a single button. The system figures out which elevator will most efficiently take you there and dispatches that elevator to the ground level.

But historic Palmer House Hilton’s elevators are not a normal system. Each wall runs independent of one another with three separate buttons to press. So if you really want to get to the seventh floor as fast as possible, you have to press all three–after all, you do not know which of the three systems will most quickly deliver an elevator to your position.

Unfortunately, this has a pernicious effect. Once the first elevator arrives, the call order to the other two systems does not cancel. Thus, they will both (eventually) send an elevator to that floor. Often times, this means an elevator wastes a trip by going to the floor and picking no one up. In turn, people on other floors waiting for that elevator suffer some unnecessary delay.

This is why (1) the elevator system takes forever and (2) you often stop at various floors and pick up no one. We would all be better off if people limited their choice to a single system, but a political scientist running late to his or her next panel does not care about efficiency.

(Let this sink in for a moment. The largest gathering of political scientists has yet to overcome a collective action that plagues it on an every day basis.)

Given the floor restrictions for the elevators, the best solution I can think of would be to install an elevator system where you press the button of the floor you want outside the elevator, and the system chooses which to send from the three walls. This would be mildly inconvenient but would stop all the unnecessary elevator movements.


*Why is it historic? I have no clue. But everyone says it is.

**The latter undoubtedly contributes to the problem, however.

Bell Curve Madness, Or How Not to Run Your Business

From a Vanity Fair article on Microsoft’s lost decade via Slate:

At the center of the cultural problems was a management system called “stack ranking.” Every current and former Microsoft employee I interviewed—every one—cited stack ranking as the most destructive process inside of Microsoft, something that drove out untold numbers of employees. The system—also referred to as “the performance model,” “the bell curve,” or just “the employee review”—has, with certain variations over the years, worked like this: every unit was forced to declare a certain percentage of employees as top performers, then good performers, then average, then below average, then poor…
For that reason, executives said, a lot of Microsoft superstars did everything they could to avoid working alongside other top-notch developers, out of fear that they would be hurt in the rankings. And the reviews had real-world consequences: those at the top received bonuses and promotions; those at the bottom usually received no cash or were shown the door…
“The behavior this engenders, people do everything they can to stay out of the bottom bucket,” one Microsoft engineer said. “People responsible for features will openly sabotage other people’s efforts. One of the most valuable things I learned was to give the appearance of being courteous while withholding just enough information from colleagues to ensure they didn’t get ahead of me on the rankings.” Worse, because the reviews came every six months, employees and their supervisors—who were also ranked—focused on their short-term performance, rather than on longer efforts to innovate…

This strikes two of my pet peeves. First, it shows a fundamental misunderstanding as to how normal distributions work. Despite many of my high school English teachers insisting otherwise, not everything is normally distributed. In fact, there is very good reason to expect Microsoft employees to not be normally distributed. Microsoft should be hiring the very best of the best programmers in the world. Thus, they are sampling from the right tail of the normal distribution–there are a lot of very good programmers, fewer very very good programmers, and yet fewer very very very good programmers.

Second, the system institutes perverse incentives. It is fully reasonable to believe that everyone within a division is a productive employee. Yet the system assumes otherwise. As a formal theorist, I am all for making assumptions when they serve a purpose. There is no purpose here. Managers should be evaluating everyone’s performance over the hypothetical replacement employee. Instead, a certain percentage are forced to be bad by assumption. This leads to stupid infighting–employee rankings are zero sum, so some might need to step on others to reach a higher rank.

It also wouldn’t surprise me if many employees bolted to other companies for better wages in search of fairer wages. If only the top five people are receiving large bonuses but the sixth-best contributed a great deal to the company, that sixth person is also worth an equitable share of the bonus. If he can’t get it from Microsoft, he’s going to go a place that will give it to him.

Are Weapons Inspections about Information or Inconvenience?

Abstract: How do weapons inspections alter international bargaining environments? While conventional wisdom focuses on informational aspects, this paper focuses on inspections’ impact on the cost of a potential program–weapons inspectors shut down the most efficient avenues to development, forcing rising states to pursue more costly means to develop arms. To demonstrate the corresponding positive effects, this paper develops a model of negotiating over hidden weapons programs in the shadow of preventive war. If the cost of arms is large, efficient agreements are credible even if declining states cannot observe violations. However, if the cost is small, a commitment problem leads to positive probability of preventive war and costly weapons investment. Equilibrium welfare under this second outcome is mutually inferior to the equilibrium welfare of the first outcome. Consequently, both rising states and declining states benefit from weapons inspections even if those inspections cannot reveal all private information.

If you are here for the long haul, you can download the chapter on the purpose of weapons inspections here. Being that it is a later chapter from my dissertation, here is a quick version of the basic “butter-for-bombs” model:

Imagine a two period game between R(ising state) and D(declining state). In the first period, D makes an offer x to R, to which R responds by accepting, rejecting, or building weapons. Accepting locks in the proposal; R receives x and D receives 1-x for the rest of time. Rejecting locks in war payoffs; R receives p – c_R and D receives 1 – p – c_D. Building requires a cost k > 0. D responds by either preventing–locking in the war payoffs from before–or advancing to the post-shift state of the world.

In the post-shift state, D makes a second offer y to R, which R accepts or rejects. Accepting locks in the offer for the rest of time. Rejecting leads to war payoffs; R receives p’ – c_R and D receives 1 – p’ – c_D, where p’ > p. Thus, R fares better in war post-shift and D fares worse.

As usual, the actors share a common discount factor δ.

The main question is whether D can buy off R. Perhaps surprisingly, the answer is yes, and easily so. To see why, note that even if R builds, it only receives a larger portion of the pie in the later stage. Specifically, D must offer p’ – c_R to appease R and will do so, since provoking war leads to unnecessary destruction. Thus, if R ever builds, it receives p’ – c_R for the rest of time.

Now consider R’s decision whether to build in the first period. Let’s ignore the reject option, as D will never be silly enough to offer an amount that leads to unnecessary war. If R accepts x, it receives x for the rest of time. If it builds (and D does not prevent), then R pays the cost k and receives x today and p’ – c_R for the rest of time. Thus, R is willing to forgo building if:

x ≥ (1 – δ)x + δ(p’ – c_R) – (1 – δ)k

Solving for x yields:

x ≥ p’ – c_R – (1 – δ)k/δ

It’s a simple as that. As long as D offers at least p’ – c_R – (1 – δ)k/δ, R accepts. There is no need to build if you are already getting all of the concessions you seek. Meanwhile, D happily bribes R in this manner, as it gets to steal the surplus created by R not wasting the investment cost k.

The chapter looks at the same situation but with imperfect information–the declining state does not know whether the rising state built when it chooses whether to prevent. Things get a little hairy, but the states can still hammer out agreements most of the time.

I hope you enjoy the chapter. Feel free to shoot me a cold email with any comments you might have.

GRE Family Fued? The Game Theory of Standardized Test Grading

I can tell you when I began losing faith in standardized testing: August 11, 2009. That was the date of my GRE. My writing prompt was as follows:

Explain the causes of war.

Wow! This could not have been any more perfect. Here I was taking the GRE so I could go to grad school and write a dissertation on the causes of war. The College Board threw me a softball!

Then I received my score: 4.5. While not terrible, the 4.5 corresponded to the 63rd percentile. According to the College Board (roughly), more than a third of GRE takers could write my dissertation better than I could!

Maybe the essay was not that great. Maybe I am a terrible writer. Maybe I don’t understand the causes of war. The academic job market will likely be the ultimate arbiter of my abilities. But until then, the 4.5 seems silly.


Years later, I came across this revealing article on standardized testing graders’ incentive structure. Many tests use some sort of consensus method. Begin by giving the test to two graders. If the marks are close, average the grades and move to the next exam. If the marks are not close, bring in a third grader and average the three grades in some pre-defined way.

Standardized testing companies are not in the business of giving correct grades–they are in the business of grading tests as quickly as possible. The potential for a third grader merely appeases a school’s desire to have some semblance of legitimacy. For the company, the third grader is a speed bump. Every test that reaches a third reader requires 50% more labor–and a non-negligible decrease in profitability for that particular test.

Realizing this problem, testing companies pay careful attention to their graders’ agreement rate. A low agreement rate is the mark of a bad employee. Supervisors might creatively limit the number of tests such an employee can grade (thus inflating the supervisor’s overall agreement rate for his or her team), while managers might fire him or her.

The testing companies deserve respect for creating mechanisms to keep employees in line with company goals. Unfortunately, those company goals do not comport to what we as consumers want out of the standardized testing scores we buy.


I have loved game shows all my life. One of my earliest memories is of watching Family Feud. The game is simple: producers survey 100 people with a variety of questions. They then tally the responses. Contestants on the show must then guess which answer was most frequently given.

Note the emphasis here is on matching, not correctness. For example, suppose the prompt was “name a cause of war.” As a contestant, I would say “greed” or “irrationality” way before I said “private information” and “commitment problems.” The first two responses are terrible, terrible answers. The second two are fantastic. Yet, because your average survey taker has not read “Rationalist Explanations for War,” I would not expect many people to give sensible responses to the survey. So commitment problems would yield a smaller score than greed. In turn, I as a contestant pander to their ignorance and say the silly thing.


Suppose you are a test grader. Better yet, say you are the test grader. God has endowed you with absolute authority in test grading matters. You know a 10 essay is a 10 essay and a 1 essay is a 1 essay. Whereas others struggle to see the difference, your observations are perfect.

But you are also broke and need a job. I hand you a test. You recognize it is a clear 10. What grade do you give it?

In a world of justice, your answer is 10. But in a world where you need to eat, you take a different route. Perhaps the essay did something strange, something you have never seen before–something like argue that bargaining problems cause war. You recognize that such an argument reflects scholarly consensus and would be the baseline for tenure at Stanford. But you also know that other graders will think that the argument is just plain bizarre. So you credit the writer for having decent organizational structure but not much substance and turn in a 7.

The other grader gives it a 6.5. The system counts it as a match and you do not get into trouble.

Grading systems are not grading systems–they coordination games with multiple equilibria. Like in the Family Feud, a reader should not give the grade the writer actually deserves but rather what he or she thinks other readers will give it.

But this leads to perverse equilibria. For example, if we all created the rule that essays beginning with a vowel are 10s and essays beginning with consonants are 2s, no one would want to break the system and risk the wrath of being labeled inefficient. So substance goes out the window. Graders instead look for focal points to coordinate their scores.

Those cues need not reflect anything of substance. To wit, consider the following review of a supervisor’s team:

[A] representative from a southeastern state’s Department of Education visited to check on how her state’s essays were doing. As it turned out, the answer was: not well. About 67 percent of the students were getting 2s.

That’s when the representative informed Farley that the rubric for her state’s scoring had suddenly changed.

“We can’t give this many 1s and 2s,” she told him firmly.

The scorers would not be going back to re-grade the hundreds of tests they’d already finished—there just wasn’t time. Instead, they were just going to give out more 3s.

3s magically appeared out of nowhere–partially because the testing company wanted a higher average, and partially because graders feared that giving a 1 or 2 would result in a mismatch and disciplining from the supervisor.

The article is full of other lovely anecdotes and worth the read. And it is also terrifying to think about.


In retrospect, I should not have written about how bargaining problems cause war. From the point of view of a grader, it is just too bizarre. I deserve my 4.5 for not properly playing the game.

(I suppose three years of grad school has made me more cynical about life. Perhaps a better subtitle for this article is “How I Learned to Stop Worrying and Love Pandering.”)

Of course, that is precisely the problem. The incentive system for grading is perverse and rewards students for writing safe–and not particularly insightful–essays. In contrast, the academic job market rewards the opposite approach. Write a dissertation that has been done a thousand times before, and you won’t find employment. Do something revolutionary, and have your pick at the top jobs.

The test companies do not have final say over the method of standardized testing grading. We do. It’s time we demand change in the system.

How Uncertainty about Judicial Nominees Can Distort the Confirmation Process

In standard bargaining situations, both parties understand the fundamentals of the agreement. For example, if I offer you a $20 per hour wage, then I will pay you $20 per hour; if I propose a 1% sales tax increase, then sales tax will increase by 1%. But not all such deals are evident. Senate confirmation of judicial nominees is particularly troublesome—the President has a much better idea of the true nominee’s ideology than the Senate does. Indeed, as the Senate votes to confirm or reject, the Senate may very well be unsure what it is buying.

This situation is the center of a new working paper from Maya Sen and myself. We develop a formal model of the interaction between the President and the Senate during the judicial nomination process. At first thought, it might seem as though the President benefits from the lack of information by occasionally sneaking in extremist justices the Senate would otherwise reject. However, our main results show that this lack of information ultimately harms both parties.

To unravel the logic, suppose the President could nominate a moderate or an extremist. Now imagine that the Senate is ideologically opposed, so it only wants to confirm the moderate. The choice to reject is not so simple, though, because the Senate cannot directly observe the nominee’s type but rather must make inferences based on a noisy signal. Specifically, the Senate receives a signal with probability p if the President chooses an extremist. (This signal might come from the media uncovering a “smoking gun” document.) The President suffers a reputation cost if he is caught in this manner. If the President selects a moderate, the Senate receives no signal at all. Thus, upon not receiving a signal, the Senate cannot be sure whether the President nominated a moderate or extremist.

With those dynamics in mind, consider how the President acts when the signal is weak. Can he only nominate an extremist? No–the Senate would obviously always reject regardless of its signal. Can he only nominate a moderate? No–the Senate would respond by confirming the nominee despite the lack of a signal, but the President could then gamble by selecting an extremist and hoping that the weak signal works in his favor. As such, the President must mix between nominating a moderate and nominating an extremist.

Similarly, the Senate must mix as well. If it were to always confirm, the President would nominate extremists exclusively, but that cannot be sustainable for the reasons outlined above. If the Senate were to always reject, the President would only nominate moderates to avoid smoking guns. But then the Senate could confirm the moderates it was seeking.

Thus, both parties mix. Put differently, the President sometimes bluffs and sometimes does not; the Senate sometimes calls what it perceives as bluffs and sometimes lets them go.

These devious behaviors have an unfortunate welfare implication–both parties are worse off than if they could agree to appoint a moderate. Since the Senate mixes, it must be indifferent between accepting and rejecting. The indifference condition means that the Senate receives its rejection payoff in expectation, which is worse than if it could induce the President to appoint a moderate. Meanwhile, the President is also mixing, so he must be indifferent between nominating a moderate and nominating an extremist. But whenever he nominates a moderate, the Senate sometimes rejects. This also leaves the President in worse position than if he could credibly commit to appointing moderates exclusively.

Further, we show that the President and Senate can only benefit from more information about judicial nominees when they are ideologically opposed. And yet there seems to be little serious effort to change the current charade of judicial nominee hearings. (During Clearance Thomas’s hearing, when asked whether Roe v. Wade was correctly decided, he unconvincingly replied that he did not have an opinion “one way or the other.”) Why not?

The remainder of our paper investigates this question. We point to the potential benefits of keeping nominee ideology secret when the Senate is ideologically aligned with the President. Under these conditions, the President can nominate extremists and still induce the Senate to accept. Keeping the process quiet allows the President to nominate such extremists without worrying about suffering reputation costs as a result. Consequently, the current system persists.

Although our focus is on judicial nominations, the same obstacles are likely present in other nominations processes. And coming from an IR background, I have been thinking about similar situations in interstate bargaining. In any case, please check out the paper if you have a chance. We welcome your comments on it.