In a word, yes.

That is not what I thought I would say when I first considered writing this blog post. Authors frequently complain that a book of theirs has sold very well and yet does not have very many reviews over on Amazon. And, in fact, that is what inspired me to look into this–my Rationality of War has sold better than I expected it to for the past two years or so, yet it still has zero reviews. Perhaps stranger, Game Theory 101: The Basics is by far my best seller, yet Game Theory 101: The Complete Textbook has the most reviews despite selling only a fraction of the copies.

*Aside: I’m being intentionally vague about the exact number of sales any particular book has made because that is a part of my contract with Amazon. The graphs that follow will also be unlabeled. Apologies.*

I understand that readers choose to review books for different reasons, so we should not expect reviews to be consistent across genres or even otherwise very similar books. But it seems weird that the difference is so big. Right?

Well, I figured doing some math could help out here. One obstacle to doing a large study on this is having data on a large number of books. Outside of New York Times best sellers, we simply do not know much about sales figures. And looking only at NYT best sellers is problematic since they are all very similar–they have all sold a tremendous number of books.

I can provide a partial solution. I have twelve books up on Amazon and have kept extensive sales records on all of them. While twelve is not a huge number, it will still provide a useful picture on the connection between sales and reviews.

My first thought was to plot the number of sales and the number of reviews each book had. This was not particularly helpful:

The diagonal line is the OLS “best-fit” line. Sales do increase the number of reviews, but the graph is not particularly meaningful because of the bunching on the left side of the graph. This is common for data of this type. Book sales have an exponential distribution–many, **many** books only sell a handful of copies while very few sell a substantial amount. My library also follows this distribution.

To solve this problem, I logged my sales figures and recreated the same graph:

Ah, much better. We now see a clear trend: more sales lead to more book reviews, though the expectation becomes murkier for the best selling of books.

It shouldn’t be surprising that more sales lead to more books–more people reading increases the number of potential reviewers, after all. However, I was surprised by just how strong the relationship is: the correlation between logged sales and reviews is .896! (Positive correlation ranges between 0 and 1, so it is difficult to get much higher than this.) Even the unlogged data have a strong correlation of .757. The number of sales really is determining the number of reviews.

**TL;DR**

- Book sales are extremely correlated with book reviews.
- Variance in the number of reviews increases as books become better sellers.

Thanks for the data and analysis, William. More useful than the usual speculation and ignorance that abounds on this subject. The question I really would like to know the answer to is – do the number of book reviews determine sales?