How do numbers begin?

In many data series a surprising number of entries begin with the number 1, and the number 2 is also more common than a random distribution might suggest.  This is called Benford’s Law.  For instance about one third of all house numbers start with one.  That may be a quirk of bureaucratic numbering psychology, but the principle also applies to the Dow Jones index history, size of files stored on a PC, the length
of the world’s rivers, and the numbers in newspapers’ front page headlines.  It does not apply to lottery-winning numbers, see the graph at the above link.  Here is an exact statement of the law:

Besides the number 1 consistently appearing
about 1/3 of the time, number 2 appears with a frequency of 17.6%,
number 3 at 12.5%, on down to number 9 at 4.6%.  In mathematical terms,
this logarithmic law is written as F(d) = log[1 + (1/d)], where F is
the frequency and d is the digit in question.

I feel as if someone is pulling my leg.  And I keep thinking of nominal interest rates being bounded from below at zero.  Yes this has practical implications:

…because a year’s accounting data of a company
should fulfill the law, economists can detect falsified data, which is
very hard to manipulate to follow the law. (Interestingly, scientists
found that numbers 5 and 6, rather than 1, are the most prevalent,
suggesting that forgers try to “hide” data in the middle.)

The law was first discovered by an economist (and astronomer), Simon Newcomb.  Here is Wikipedia on the law.  Here is more startling data on where the law applies.  From a completely orthogonal but I suspect not totally irrelevant direction, here is Tim Harford on price stickiness.

This whole topic makes me feel like an idiot for even bringing it up, with apologies to Pythagoras. 

Comments

Hey Tyler,

It is a really interesting phenomenon. Check out Walter Mebane's work on using Benford's Law to detect fraud in elections.

Charlie

Sadly though, my country's lottery numbers do not occur according to Benford's Law.

I checked through a book which claimed to contain the jackpot numbers for the last 30 years, and sadly, it looks like a random number table.

Ah, I'd have been so psyched if it did follow Benford's Law

I rediscovered this after reading Mario Livio's
book about PHI
, where it is nestled away in an appendix if I recall correctly. Amazing how much of everything in the world seems to be governed by PI, PHI, e, or some other transcendental number.

Wow:

“In 1961, Pinkham discovered the first general relevant result, demonstrating that Benford’s law is scale invariant and is also the only law referring to digits which can have this scale invariance,† the scientists wrote in their letter. “That is to say, as the length of the rivers of the world in kilometers fulfill Benford’s law, it is certain that these same data expressed in miles, light years, microns or in any other length units will also fulfill it.†

Should I start believing in God?

Also, I think this is closely related to the problem about envelopes with money in them where one has half the money of the other. Pick one, say it has $x -- should you switch?

It's actually enough if the street is equally likely to contain 10-20 homes and 50-100 homes. Then, the probability of the first digit being 1 (10-20 homes) is equal to the sum of probabilities of the first digit being 5, 6, 7, 8, 9 (50-100 homes) and larger than any of those probabilities alone.

This explains Dow Jones index case very well. Over long periods of time, the index should be spending the same amount of time between 1000 and 1999 (first digit 1) as between 2000 and 3999 (first digit 2 or 3) or between 5000 and 9999 (first digit 5, 6, 7, 8 or 9). As a result, the first digit 1 is much more likely.

Grant Gould: Yes I realized that this morning, and said a big "D'oh" for me. This is the problem with typing anything or in fact trying to think at 3AM with half a bottle of grape juice in you that seems to be turning to vinegar. Or I suppose I could have argued that the square root of five is transcendental in some other logic system, the derivation of which is left as an exercise for the reader.
But I have often wondered about the prevalence of certain of any type of numbers. I know rationally it doesn't really mean anything, and asking questions like "Why is the fine structure constant so close 1/137?" will yield no reason. There is a very good combinatorial reasons for Benford's law, but if there wasn't that sense of wonder, I doubt Tyler would have bothered with the post in the first place.

Phoebe: Only if you already did believe, in which case all such phenomena would generally profit from being ascribed to some extrasensory force, rather than chance or mathematics, which would tend to make one not believe in such things.

o: F(1) = log(2) = 0.301.
F(1) /= ln(2) = 0.69.

Base 10 logs, not natural logs.

Phoebe,

I believe in God, but I don't think that the physical applications of Benford's law support His existence.

The end conclusion from Benford's law merely says that nearly all sets of numbers follows a logarithmic distribution. According to the articles posted, it has been rigorously proven that the "mixing" of many different distributions will approach a logarithmic distribution. Therefore, a logarithmic distribution of, say, the lengths of rivers only says that the lengths of rivers depend on many different variables which, in turn, have many different type of distributions.

As far as I can see, Benford's law only has application in detecting the application of non-logarithmic distribution on data which should depend on many different variables with different distributions. For example, numbers in a company's annual report depend on many different factors and data with a uniform distribution indicates unnatural fraud.

This is because we write in base ten. It's really that simple.

Person's posts is correct. I can't understand if you all really thought an economist found a previously unknown law that is a definition about how counting works.

From Wikipedia: "and similarly for [longer numbers] without leading zeros"

Just as I suspected - zeros would be in the lead, but for their being actively suppressed - no doubt at the behest of the digit that likes to think of itself as "the One."

Nothingness is everywhere!

Note that made-up numbers are going to fail to follow Benford's law, but that numbers created by a crooked process that follows the same sort of pattern as a real process, as far as scaling and such, will follow it just fine. That is, if you just make up the number of votes for each candidate in your county, you can be caught, but not (by Benford's law, anyway) if you just randomly switch every tenth Gore vote to be a Bush vote instead.

This is because we write in base ten. It's really that simple.

Note that, in base two, every number starts with 1. In base infinity, exactly one number starts with 1.

This effect pretty much mirrors the fact that the numbering system we use is itself logarithmic.

As suggested, it is very much like the two-envelope puzzle mentioned above. Every random number comes from a distribution. And every random distribution comes from a distribution of distributions. The net result is logarithmic.

Thanks for the link, Michael Sullivan.

Anyone noticed Microsoft's assertions of patents violated show a preponderance of 5s and 6s?

Lord: made me guffaw.

Which is an excellent demonstration of the theory (very popular among mathematicians) that vocabulary limits conceptualization.

Comments for this post are closed