I had never thought of this:
In a sense, base 3 is the best of the integer bases because 3 is the integer closest to e…Suppose you are creating one of those dreaded telephone menu systems — press 1 to be inconvenienced, press 2 to be condescended to, and so forth. If there are many choices, what is the best way to organize them? Should you build a deep hierarchy with lots of little menus that each offer just a few options? Or is it better to flatten teh structure into a few long menus? In this situation a reasonable goal is to minimize the number of options that the wretched caller must listen to before finally reaching his or her destination. The problem is analogous to that of representing an integer in positional notation: the number of items per menu corresponds to the radix r, and the number of menus is analogous to the width w. The average number of choices to be endured is minimized when there are three items per menu.
I have no idea if it is correct. It is from the often quite interesting Group Theory in the Bedroom, and Other Mathematical Diversions, by Brian Hayes.















My guess is that it’s true only if all choices are equally likely.
Mmmm. Interesting.
As John S said, it shouldn’t be a symmetrical tree, but depend on frequency of choices, with infrequent choices further down the tree. However, this does NOT effect whether then span should be 3 at each level, ceteris paribus. At any given level, all choices (through their summed descendents) should have similar probabilities.
The objective function should be to minimise user time, not choices. In practise, dimensionally alike options will have to be put together, and complex parsing may have to be split apart. Was there a time cost for answering?
Overall, individual system variance will be so large that I suspect this finding is true but irrelevant.
The proof is easy. Maximize x^y (number of possible destinations) subject to (x*y) being some constant C (number of choices). This is equivalent to the problem of minimizing x*y subject to x^y fixed — the math works out the same either way.
Okay, so y=C/x. Maximize x^(C/x). First order condition: x^(C/x) * (C/x^2 – log (x) C/x^2) = 0. In other words, log(x)=1; x=e.
Mmmmm… Huffman coding applied to lists…
Alternate way of reaching of Alex F’s result:
Define the cost function C(r,k) to be the number of items listened to, when there are k total items and r items per menu. Then C(r,k) = r * log_r k. and setting its derivative equal to zero, after a bunch of rearranging, gives (ln k / ln r)(1 – 1/ln r) = 0, to which r = e is the only solution.
(I assume that callers listen to every option on each menu they reach; if you relax that assumption and let them push a button as soon as they hear the one they want, it introduces a constant factor of 1/2 to C(r,k), and the minimizing value of r is unchanged.
A few corrections:
1) Arithmetic coding is more efficient than Huffman coding. They converge to the same thing if all your probabilities are powers of two.
2) Alex F’s proof has a mistake. He’s forgotten to add options for additional menu options. To take the extreme view, put two options at each level. Option 1 is do A. Option 2 is hear more options.
I’ve got a ways to go before I get to thinking about that.
I’m still pondering why I have to press 1 for English. And no, I don’t think 1 should be for Spanish or Chinese.
“Why not just use base e”
Well, it’s fine if you don’t mind expressing integers with infinite non-repeating ‘e’-themal expansions…
Gef — I didn’t mean to make any sort of normative statement about how to set up menus, or to defend the relevance of the math to the actual real-world problem. You can take this up with Brian Hayes. But Brian says “e” is the answer to a question, so I figured I’d state a mathematical version of the question and show why “e” is the answer. (I haven’t read the book, but I assume the argument behind Brian’s result is essentially the one that James and I made.)
I agree that the problem as I’ve stated it is really about minimizing the maximum number of choices to endure rather than the average number, and that these are only interchangeable under very specific assumptions — again, take this up with Brian Hayes.
Lester Telser had an article in which he argued that the denominations of currency and coinage ought to be at multiples of 3: 1,3,9,27,81,…
The reasoning is analogous to the above. However, Van Hove argues for the “Principle of Least Effort”, i.e., base 2.
Telser, L.G. 1995. Optimal denominations of coins and currency. Economics Letters 49, 425-427.
Van Hove, L. 2001. Optimal denominations for coins and bank notes: in defense of the principle of least effort. Journal of Money, Credit, and Banking 33 (4), 1015-1021.
SB7, thanks for the clarification.
I thought the standard Huffman tree would not suit our needs, but it could. We’d have to assume that the tree is binary: i.e., you always hear options of the form: “If blah blah blah, press 1, otherwise press 2.”
If we want to relax this assumption we run into trouble. As you say, ternary or quaternary coding is not a big deal (just pick the three, or four, least frequent subtrees in the recursive step, instead of picking two).
But with variable branch factors it is trickier.
We can’t assume that the greedy property still holds. Suppose we did assume this: we would be left with the following (wrong) recursive step:
Recursive step: With S subtrees remaining, try all values of n from n=2 up to S. Calculate the “loss” from placing the final ‘n’ subtrees underneath a new node. Keep track of the step with the least loss.
This doesn’t work, since you want the lowest total loss, not the lowest loss at each step. The proposed algorithm is biased against values of n>2.
You may be able to do this using dynamic programming, where the state of the dynamic program is the number of remaining subtrees. I haven’t checked, but that may work.
mk, you’re absolutely right. Getting a greedy method working with variable branching wouldn’t be easy (and may not be possible). I had some heuristics in mind when I suggested it, but nothing close to provable optimal.
I really expected to see something about a threesome here.
With the caveat in place that I have no idea to what this small selected quote is referring…
Binary options work best, period.
Binary options remove 50% of the options immediately and have the goal of getting to the right response in as short a matter of time as possible.
Let’s say we have a populace (and we almost always do) where 50% of the populace need A, 25% need B, 12.5% need C and so on, until we get to the cranky (and/or maverick) ones who fall into a host of different requirements… each of which are required by some very minute amount of the populace. By forcing “A people” to listen to THREE options rather than the binary “do you require A or something else?” we are wasting time (1/3 of it) on 50% of our populace. And so on, and so on.
As mentioned however, I have no idea as to what the writer is referring to in the quoted selection, but I believe that I’ve best addressed the problem that I like to imagine was the one in play.
mnuez
P.S. I rock.
John S. is completely correct.
To my knowledge, what Hayes is talking about has to do with Shannon entropy.
http://www.bearcave.com/misl/misl_tech/wavelets/compression/shannon.html
Comments on this entry are closed.