Probability fact of the day

The average class size experienced by students is almost always larger than the average class size experienced by professors.

More at God Plays Dice. I recall a JEP piece (?) many years ago which used the same idea to explain the “road curse”–given two equally good roads most people will always choose the more crowded!


Likewise, the number of times you will find yourself in a group with five other guys and one woman is much larger than the number of times you will be the only guy in a group with six women.

The parallel situation is that if half of all groups are 6 men and 1 woman, and the other half are 1 man and 6 women, then both women and men will spend much more of their time in groups dominated by their own sex, and things will seem skewed to both.

Since groups are not randomly chosen this doesn't seem like a very good example of this effect.

Neither the classes nor roads we choose to take are random. Probabilities do not often have randomness at their core. Assuming randomness exists.

Diversity is hard.

Your friends have more friends than you do.

You will quit the clubs who accept you as a member.

Everyone around you is an idiot.

This is a perception fact rather than a probability fact.

No, this is a probability fact. It has to do with what you're weighting by when averaging.

One question is: "Pick a random professor. What size class does she teach?" The other question is "Pick a random student. What size class does he have?" (It works the same if you assume that each person only has one class; the principle is the same if you average the classes that professors or students have.)

Those are different questions, probabilistically, because they put different measures on the events. One is a uniform distribution over professors, one is a uniform distribution over students.

Imagine that there are 50 students, two professors, and two classes, one taught by Prof A with 10 students, one taught by Prof B with 40 students.

1) Pick a random class or professor. How many students do we expect it to have? The answer is (10 + 40) / 2 = 25.

2) Pick a random student. How many students do we expect in the class that the student takes? The answer is (10 * 10/50) + (40 * 40/50) = 34. (10/50 chance of choosing a student in the class with 10, 40/50 chance of choosing a student in the class with 40.)

This is an important principle of probability that people don't understand well.

Both are important averages, both have real meaning. In different situations one or the other will be more important.

Good post, John. I did pretty well in my stats class 2 decades ago and I remember boggling at this and other 'counterintuitive' concepts.

Another similar statistical principle is that even if, as I saw the other day, 50% of trips are 0-6 miles, whereas only 5% are 30 miles or over, you'll cover more distance in the 30 mile or over trips (and potentially spend more time on them).

The "road curse" is easily explained by changing the tense. Given two equally good roads, more people will have chosen the more crowded one.

A trickier phenomenon to understand is why the other lane always seems to move faster on a busy highway.

Does this take us back to the zipper merge argument?

The speed at which your lane goes is at least partially dependent on your choice of lanes. Some choices might reflect your need to exit or merge. Some choices reflect your expectations about lane speeds. Some choices depend on your comfort with having traffic on one side, the other, or both. Choice might depend on the distance you have to travel and your willingness to allow others to merge. Part of the choice depends on your level of risk aversion. To the degree that there are people just like you, your lane will be slower.

People who play 1, 2, 3, 4, 5, 6 in the lottery will split the prize with 300 other 'smart' people if they win. Anything that coordinates people's behavior is going to put strain on resources.

The point I had in mind was from the book Traffic by Tom Vanderbilt. It talks about a car that had the same number of "passing" and "overtaking" events i.e. overall it maintains the same relative position with respect to the other lane.

Problem is you spend more time being passed than you do in passing the other cars.

plane crashes on the US-Mexican border? Where do they bury the survivors?

Thank you. I'll be here all week.

You don't bury survivors. Very old joke.....

You don't, maybe; but those Tijuana drug cartels...

I recall that JEP article. It was in the puzzles section in the 90s when Nalebuff was writing it. Schools always advertise the student faculty ratio of average class size! Not the average for a student but average of classes.

Good point. I understand the different averages just fine, but this implication never occured to me.

This distinction is critical in calculating time-differentiated electricity costs as well. Electricity prices are higher at times when demand is higher, so the average electricity price per kilowatt-hour calculated by averaging 8760 hourly prices (which is what a generator would receive for generating at a constant rate all year) is considerably lower than than the average price per kilowatt-hour paid by the average consumer (which is demand-weighted, taking into account that the highest demands occur at times of peak price). As John Thacker said, which is relevant depends on the question you're trying to answer.

This is also related to the epic Girl Named Florida puzzle...

You walk up to a house that you know has two children. The father opens the door. You ask whether he has a daughter. He says yes. What is the probability the father has two daughters?

Now you walk to the house next door that you also know has two children. A daughter opens the door. What is the probability the father has two daughters?

Is the twist that "daughter" is broadly-defined, so that in the second puzzle "daughter" might actually mean the mom (who, after all, must be somebody's daughter)? Or the "daughter" may be the girlfriend of one of the father's two sons. Or it could be a stepdaughter, meaning that while the *house* has two children, the *father* has none.

Either way it sounds like the setup of a blue joke.

No, nothing tricky -- except in the probability sense.

It goes to what John Thacker was talking about above: Are you counting classes or counting students? Here the question is are you counting families or counting children?

Either way it sounds like the setup of a blue joke.

Good point. Next time they'll be farmhouses.

Well not to be dense but can you explain it to me? I understand the OP fine, but this?

If the father answers the door and answers the question "Do you have a daughter?" with "Yes", then the probability the family has two daughters is 1/3.

If a daughter answers the door, without asking any question whatsoever, you know the probability the family has two daughters is 1/2.

The reasoning on the first is that the ways the father can have two children are BB, BG, GB, GG, so 1/3 of those with one girl have two girls.

The reasoning on the second is that seeing a daughter open the door tells you nothing whatsoever about her sibling. Therefore the prior probability of the sex of the sibling stands: 1/2.

How it relates to the OP is that the first statement averages across families, while the second statement averages across children. Thus while only a third of two-child families with a girl in them will have two girls in them, half of girls from two-child families will come from families with two girls in them.

You're welcome.

The fun of this problem is that, to answer the 1/2 question, you have to untrain all the intuition you retrained in answering the 1/3 question.

The fact that a daughter answered; doesn't it make it more likely that there are daughters in there? Can't this be factored in a Bayesian sort of way? Say, you went there 10 times and each time a daughter answered; what's our expectation now?

Indeed, that's where the symmetry is broken.

There is only one father who can open the door. But if there are two children, it is twice as likely that a daughter opens the door if there are two girls than if there is one girl and one boy.

In Bayesian terms...

P(GG|G opens door) = P(G opens door|GG)*P(GG)/P(G opens door) = 1 * (1/4) / (1/2) = 1/2

Put another way, the 1/3 case in the first statement has two chances to occur in the second statement because there are two "girl opens door" cases to count when there are two girls, changing the final probability to 1/2.

Wow, this blows my mind a lot more than students and professors, which I got right away.

Another way is to say that after the father says that he has one daughter, the two children come out one after another. Either a boy or girl comes out first. If the boy comes out, then the next child has to be a girl. If a girl comes out first, only then does the probability become 50%.

I think it's tough to wrap my mind around it because, when the father says he has one daughter, I mentally think of the first child as a daughter. In reality, both the first and second child can be either a boy or a girl.

It's the difference, if you had two coin tosses, between knowing your first coin toss is heads and knowing there will be at least one heads.

Yes, coins is probably the most the most approachable variant of the problem. The difference between "the first" is heads and "one of them" is heads is that the former imposes a total ordering on the coins that means you must average across the coins individually rather than across the coins as a pair. And coins provide so many ways to offer that total ordering -- first coin tossed, first coin looked at, the nickel, etc. -- all of them right before you on the floor and not hidden in some random house.

My god the fights I've seen over this one. Much worse than the Monty Hall puzzle, and you wouldn't think it at first glance.

Both problems make my head spin each time I think about them.

Yup, Alex covered the girl named Florida one here

No, Alex gave a correct solution to the wrong problem there. To see why, let's tweak the original, non-Florida version of problem a bit. BTW, this is based on a cautionary tale about the incorrect way to solve conditional probability problems like this, first published in 1889. Joseph Bertrand's warning is usually ignored for the Two Child Problem when the answer "1/3" is desired, but utilized by the same people - such as Leonard Mlodinow - for the Monty Hall problem when they want to dismiss the answer "1/2." Yet the two problems are identical with respect to this logic, differing only in the numbers used.

Suppose you recall that a family has two children. What are the chances the children share the same gender? Yes, this one is meant to be easy - it's 1/2.

But suppose you later remember that the family includes a girl. Do the chances change? It is tempting to say they do. Of the four possible combinations, three include a girl, and only one of those includes two. So it seems the answer should be 1/3. But if that is true, if you had later recalled a boy instead of a girl, the chances would also change to 1/3. And if it changes to 1/3 no matter what you recall about one child, the Law of Total Probability says the answer to the question in the last paragraph is also 1/3. So we seem to have a paradox (and it is known worldwide as Bertrand's Box Paradox).

The resolution of the paradox is that, when the family has a boy and a girl, you could recall either gender. So you can't count all of the families that *have* a girl as ones where you *recall* that one is a girl. You have to multiply the count by the probability you would recall a girl. And since we have no reason to assume girls are easier to recall than boys, that probability is 1/2. So the answer to the second question is not GG/(GG+BG+GB)=1/3, it is GG/(GG+(BG+GB)/2)=1/(1+2/2)=1/2. And the paradox goes away.

If you add the Florida bit, the logic is the same. There are two children, and one is not a girl named Florida. You had an equal chance to recall her, and that makes the answer exactly 1/2.

The same reasoning applies to the Monty Hall problem. When you would lose by switching, that means Monty Hall had the choice of two doors to open. When you would win, he had no choice. Since he only opened one, you must multiply the original probability the car is behind each door by the probability Monty woudol openthe door he did. So the answer is not P(switch&win)/(P(switch&win)+P(switch&lose)) = (1/3)/(1/3+1/3)=1/2, it is P(switch&win)/(P(switch&win)+P(switch&lose)/2) = (1/3)/(1/3+1/6)=2/3.

But what if you ask instead whether he has a daughter born on Tuesday, and he says yes? :) What is the probability the father has two daughters? (This is related to the Girl Named Florida puzzle.)

Also, in parallel to your question above, suppose you walk up to the house next door and a daughter born on Tuesday answers the door. What is the probability the father has two daughters?

Assuming above that all families have two children and that the probability of any child being born a girl is 0.5 and that you randomly selected the father and the house.

13/27. Very good puzzle!

And 1/2, of course.

This is also why most people think that they are below average drivers. They compare themselves to the other cars on the road, instead of to other people. The people who drive the most tend to be more skilled drivers, so the average vehicle-mile (or vehicle-minute) is driven by someone who is a better driver than the average person.

It's an elegant little argument, which unfortunately has a false conclusion since in fact most people think they're above average drivers.

....the importance of empiricism.

This is also why most people think that they are below average drivers

They do? I thought most people thought they were above average.

Ah, here we go:

Didn't make it all the way through a fairly short post, did you?

Apparently not.

This post made me think of this article by a sociologist:

Most people at the gym, will be more well-trained than you. Most of your Facebook-friends will have more Facebook-friends than you do. Most people finding your Geocache, will have a higher find-count than you do.

All of these will be true even if you are, in all of these cases, an average gym-goer, average Facebook-user and average geocacher.

Statistics is hard.

This also means that people's perceptions of the number of criminals in society are probably overstated. It is more likely that a random person will have been the victim of a crime than is the likelihood that any random person has victimized someone.

There are other examples that make the probability lesson extremely easy to understand: The average number of youtube videos watched by any random person far exceeds the average number of youtube videos any random individual has made. The random youtube video maker will watch youtube videos that have an average number of views that far exceed the average of his/her own.

A contemporary example:

Comments for this post are closed