What happens to books when they come out of copyright?

For the United States, 1922 is the cut-off year for the end of public domain:

Here is more from Eric Crampton, drawing upon Paul Heald.


Are those sales or merely offerings?

Doubt they are sales; but even if they are offerings the y-axis numbers sound very low. Are they somehow normalized?

e.g. 350 books for 2000-2010 period is merely 35 books a year. Sounds very low for Amazon; unless "Amazon Warehouse" is some subset I do not understand.

From the slides in the video of his talk, it looks like they took a random sample of 2500 new books in the warehouse and tallied the decade that the books are from.

Interestingly there's a huge mismatch if one does an advanced search on Amazon for "Books published during a year". 31,000 books were published in 1917, 29,000 in 1918 and 33,000 in 1919. 1920 was indeed a boom year and had 45,000 publications.

There's a fall at 1922 but nowhere near as drastic as the sampling in this post indicates. 1925-1927 had about 37,000 books every year and by 1935 we are already at the pre-1922 45,000-book mark. The graph in this post is more frighting, making it seem like we have only recovered from the 1920's decline in the last decade.

In fact, 1955 shows 78,000 publications and 1965 shows 152,000 titles.

I guess, there's some double-counting and other bookeeping issues but yet I tend to view the data in this post with some skepticism. Or there's some systematic bias between what appears on Amazon's website versus its warehouses.

Or there’s some systematic bias between what appears on Amazon’s website versus its warehouses.

There is. Amazon's web site is also a marketplace for used books sold by third parties. The fact that a book is listed certainly doesn't mean that Amazon is stocking it in its warehouses (or that the book is in print and that new copies are available from anybody at all).


You are right. The new-books-only correction makes the plot a lot different:


It's closer to the Warehouse Sample; yet not as drastic.

As Paul Heald explains below, Amazon only records the publication date for the edition it's selling. He manually checked the publication dates for all 2500 books he sampled at random.

The point of the extension of copyright was partly about protecting a few decades-old works that have value to their copyright holders (Mickey Mouse). But it also serves to protect newly created works from competition from the long tail of older out-of-print but still copyrighted material (as well as 'remixes' of the same). Which really sucks. This, I believe, is also why publishers fought Google so hard about making 'orphaned' works available -- the publishers weren't worried about losing the ability to make money on orphaned works by bringing out new editions some time in the future. No -- they just want to keep as much as possible buried and out-of-print to reduce competition.

Are books made in 1930 really a good substitute for books made in 2010?

I can't imagine why not. Have books objectively improved in the past 80 years?

The only exception would be the Hunger Games/Twilight kinds of books, where the entire appeal is that everyone else in your age group is reading them too.

So what about the books made in 1910? Go look at the NYT best-seller's list. Why are people buying all these new books when so many books from 1910 are available for free? Have they just read all of them?

Lots of reasons. Advertisement being a big one. Bookstores, authors, and publishers have obvious reasons for pushing new books instead of out of copyright books. The Twilight effect I mentioned earlier is a big one too. Books that get on the NYT best seller list stay there because, hey, they're NYT best sellers! I better buy one! And, although you were trying to be snarky, "they already own it" is a good answer too. Stephanie Meyer sold more books this week, but Dickens has sold more books over the past 20 years. The NYT Best Seller list, by definition, only includes those books sold in the last week. But no one repurchases a book they already own.

But the simplest answer might be because the NYT Best Seller list, by definition, only refers to books which are *sold* (or so we guess. their method of determining that list is famously opaque.) Not those that are read for free.

You're right that I'm only looking at the big sellers so there's a selection bias. (Although Meyer is not on the list.)

But if 1910 books are such good substitutes, there would be almost no new books being sold.

Tyler posts his "what I'm reading now" every once in a while. I know he has classics in there, but how come so many are new?

I don't think I've been all that straightforward so let me sum it up like this: if 1930 books were good enough substitutes to wreck the market for 2010 books, then 1910 books would have already done it.

Well Tyler posts a lot of nonfiction books. Obviously 1910 nonfiction books will be largely obsolete.

But even limiting it to fiction, why does it matter whether they'll "wreck the market." I think the question is whether they will *effect* the market, not whether they'll wreck it.

Yes, books have improved objectively in the past 80 years, in the sense that years past had many fewer good books (the top few books of each year are probably around the same level).

I suspect the best new books are objectively better than the best old books (to the very limited extent that such a thing could be objectively determined). My reasoning is pretty simple: the population size for "books from 2010" is vastly larger than that for "books from 1910." So if you took the top five books from 1910 and the top five books from 2010, the 2010 books would likely be better, simply because the competition is more intense.

It's like looking for the smartest people for the respective years. There are about seven times as many people now.

On the other hand, 2010 also sees a lot more self-published books, which are probably of generally lower quality. But I don't think that alters the picture too much.

I regularly hear music that came out 50 or 60 years ago. To a degree, the popularity of certain media has been stagnating along with copyright law. Should changes in the length of popularity be considered the how long a copyright lasts?

If anything, that seems like an argument for shorter copyright terms.

"number of misprints per page" also peaks in 1922, i suppose.
amazon needs to stop selling such shoddy OCR work

Hi, I just wanted to note that Amazon does not know when a book it sells was first published. It only knows the date of publication of the volume that it is selling, e.g. Treasure Island could have a date of 2002, if that's the edition Amazon is selling. I had to check each of the 2500 books at the Library of Congress to determine the actual initial publication date. This is why stats taken from an Amazon "year of publication" stats don't match up. Cheers, Paul Heald

cheers to you... this is one of the most interesting charts of the year IMO

Comments for this post are closed