Probability fact of the day

by on February 29, 2012 at 10:55 am in Economics | Permalink

The average class size experienced by students is almost always larger than the average class size experienced by professors.

More at God Plays Dice. I recall a JEP piece (?) many years ago which used the same idea to explain the “road curse”–given two equally good roads most people will always choose the more crowded!

mobile February 29, 2012 at 11:06 am

Likewise, the number of times you will find yourself in a group with five other guys and one woman is much larger than the number of times you will be the only guy in a group with six women.

John Thacker February 29, 2012 at 11:10 am

The parallel situation is that if half of all groups are 6 men and 1 woman, and the other half are 1 man and 6 women, then both women and men will spend much more of their time in groups dominated by their own sex, and things will seem skewed to both.

Andy February 29, 2012 at 6:18 pm

Since groups are not randomly chosen this doesn’t seem like a very good example of this effect.

Daniel Dostal March 1, 2012 at 1:49 pm

Neither the classes nor roads we choose to take are random. Probabilities do not often have randomness at their core. Assuming randomness exists.

Willitts February 29, 2012 at 9:09 pm

Diversity is hard.

Erik February 29, 2012 at 11:08 am

Your friends have more friends than you do.

tkehler February 29, 2012 at 11:31 am

You will quit the clubs who accept you as a member.

jrm February 29, 2012 at 5:42 pm

Everyone around you is an idiot.

Bill Rich February 29, 2012 at 11:11 am

This is a perception fact rather than a probability fact.

John Thacker February 29, 2012 at 11:16 am

No, this is a probability fact. It has to do with what you’re weighting by when averaging.

One question is: “Pick a random professor. What size class does she teach?” The other question is “Pick a random student. What size class does he have?” (It works the same if you assume that each person only has one class; the principle is the same if you average the classes that professors or students have.)

Those are different questions, probabilistically, because they put different measures on the events. One is a uniform distribution over professors, one is a uniform distribution over students.

John Thacker February 29, 2012 at 11:19 am

Imagine that there are 50 students, two professors, and two classes, one taught by Prof A with 10 students, one taught by Prof B with 40 students.

1) Pick a random class or professor. How many students do we expect it to have? The answer is (10 + 40) / 2 = 25.

2) Pick a random student. How many students do we expect in the class that the student takes? The answer is (10 * 10/50) + (40 * 40/50) = 34. (10/50 chance of choosing a student in the class with 10, 40/50 chance of choosing a student in the class with 40.)

This is an important principle of probability that people don’t understand well.

John Thacker February 29, 2012 at 11:20 am

Both are important averages, both have real meaning. In different situations one or the other will be more important.

msgkings February 29, 2012 at 2:17 pm

Good post, John. I did pretty well in my stats class 2 decades ago and I remember boggling at this and other ‘counterintuitive’ concepts.

John Thacker February 29, 2012 at 11:13 am

Another similar statistical principle is that even if, as I saw the other day, 50% of trips are 0-6 miles, whereas only 5% are 30 miles or over, you’ll cover more distance in the 30 mile or over trips (and potentially spend more time on them).

John Thacker February 29, 2012 at 11:21 am

The “road curse” is easily explained by changing the tense. Given two equally good roads, more people will have chosen the more crowded one.

Rahul February 29, 2012 at 12:12 pm

A trickier phenomenon to understand is why the other lane always seems to move faster on a busy highway.

Willitts February 29, 2012 at 9:19 pm

Does this take us back to the zipper merge argument?

The speed at which your lane goes is at least partially dependent on your choice of lanes. Some choices might reflect your need to exit or merge. Some choices reflect your expectations about lane speeds. Some choices depend on your comfort with having traffic on one side, the other, or both. Choice might depend on the distance you have to travel and your willingness to allow others to merge. Part of the choice depends on your level of risk aversion. To the degree that there are people just like you, your lane will be slower.

People who play 1, 2, 3, 4, 5, 6 in the lottery will split the prize with 300 other ‘smart’ people if they win. Anything that coordinates people’s behavior is going to put strain on resources.

Rahul March 1, 2012 at 8:19 am

The point I had in mind was from the book Traffic by Tom Vanderbilt. It talks about a car that had the same number of “passing” and “overtaking” events i.e. overall it maintains the same relative position with respect to the other lane.

Problem is you spend more time being passed than you do in passing the other cars.

charlie February 29, 2012 at 11:27 am

plane crashes on the US-Mexican border? Where do they bury the survivors?

Thank you. I’ll be here all week.

Joshua February 29, 2012 at 12:33 pm

You don’t bury survivors. Very old joke…..

Urso February 29, 2012 at 1:16 pm

You don’t, maybe; but those Tijuana drug cartels…

Aaron February 29, 2012 at 11:54 am

I recall that JEP article. It was in the puzzles section in the 90s when Nalebuff was writing it. Schools always advertise the student faculty ratio of average class size! Not the average for a student but average of classes.

byomtov February 29, 2012 at 4:31 pm

Good point. I understand the different averages just fine, but this implication never occured to me.

JF February 29, 2012 at 12:16 pm

This distinction is critical in calculating time-differentiated electricity costs as well. Electricity prices are higher at times when demand is higher, so the average electricity price per kilowatt-hour calculated by averaging 8760 hourly prices (which is what a generator would receive for generating at a constant rate all year) is considerably lower than than the average price per kilowatt-hour paid by the average consumer (which is demand-weighted, taking into account that the highest demands occur at times of peak price). As John Thacker said, which is relevant depends on the question you’re trying to answer.

MikeP February 29, 2012 at 12:56 pm

This is also related to the epic Girl Named Florida puzzle…

You walk up to a house that you know has two children. The father opens the door. You ask whether he has a daughter. He says yes. What is the probability the father has two daughters?

Now you walk to the house next door that you also know has two children. A daughter opens the door. What is the probability the father has two daughters?

Urso February 29, 2012 at 1:24 pm

Is the twist that “daughter” is broadly-defined, so that in the second puzzle “daughter” might actually mean the mom (who, after all, must be somebody’s daughter)? Or the “daughter” may be the girlfriend of one of the father’s two sons. Or it could be a stepdaughter, meaning that while the *house* has two children, the *father* has none.

Either way it sounds like the setup of a blue joke.

MikeP February 29, 2012 at 1:29 pm

No, nothing tricky — except in the probability sense.

It goes to what John Thacker was talking about above: Are you counting classes or counting students? Here the question is are you counting families or counting children?

Either way it sounds like the setup of a blue joke.

Good point. Next time they’ll be farmhouses.

Urso February 29, 2012 at 3:28 pm

Well not to be dense but can you explain it to me? I understand the OP fine, but this?

MikeP February 29, 2012 at 4:38 pm

If the father answers the door and answers the question “Do you have a daughter?” with “Yes”, then the probability the family has two daughters is 1/3.

If a daughter answers the door, without asking any question whatsoever, you know the probability the family has two daughters is 1/2.

The reasoning on the first is that the ways the father can have two children are BB, BG, GB, GG, so 1/3 of those with one girl have two girls.

The reasoning on the second is that seeing a daughter open the door tells you nothing whatsoever about her sibling. Therefore the prior probability of the sex of the sibling stands: 1/2.

How it relates to the OP is that the first statement averages across families, while the second statement averages across children. Thus while only a third of two-child families with a girl in them will have two girls in them, half of girls from two-child families will come from families with two girls in them.

Urso February 29, 2012 at 4:47 pm

Thank you.

MikeP February 29, 2012 at 7:08 pm

You’re welcome.

The fun of this problem is that, to answer the 1/2 question, you have to untrain all the intuition you retrained in answering the 1/3 question.

Rahul February 29, 2012 at 8:22 pm

The fact that a daughter answered; doesn’t it make it more likely that there are daughters in there? Can’t this be factored in a Bayesian sort of way? Say, you went there 10 times and each time a daughter answered; what’s our expectation now?

MikeP February 29, 2012 at 9:47 pm

Indeed, that’s where the symmetry is broken.

There is only one father who can open the door. But if there are two children, it is twice as likely that a daughter opens the door if there are two girls than if there is one girl and one boy.

In Bayesian terms…

P(GG|G opens door) = P(G opens door|GG)*P(GG)/P(G opens door) = 1 * (1/4) / (1/2) = 1/2

Put another way, the 1/3 case in the first statement has two chances to occur in the second statement because there are two “girl opens door” cases to count when there are two girls, changing the final probability to 1/2.

Matt Waters February 29, 2012 at 11:07 pm

Wow, this blows my mind a lot more than students and professors, which I got right away.

Another way is to say that after the father says that he has one daughter, the two children come out one after another. Either a boy or girl comes out first. If the boy comes out, then the next child has to be a girl. If a girl comes out first, only then does the probability become 50%.

I think it’s tough to wrap my mind around it because, when the father says he has one daughter, I mentally think of the first child as a daughter. In reality, both the first and second child can be either a boy or a girl.

It’s the difference, if you had two coin tosses, between knowing your first coin toss is heads and knowing there will be at least one heads.

MikeP March 1, 2012 at 12:05 pm

Yes, coins is probably the most the most approachable variant of the problem. The difference between “the first” is heads and “one of them” is heads is that the former imposes a total ordering on the coins that means you must average across the coins individually rather than across the coins as a pair. And coins provide so many ways to offer that total ordering — first coin tossed, first coin looked at, the nickel, etc. — all of them right before you on the floor and not hidden in some random house.

Dan Weber February 29, 2012 at 1:30 pm

My god the fights I’ve seen over this one. Much worse than the Monty Hall puzzle, and you wouldn’t think it at first glance.

Rahul February 29, 2012 at 2:13 pm

Both problems make my head spin each time I think about them.

Stuart February 29, 2012 at 1:58 pm

Yup, Alex covered the girl named Florida one here

http://marginalrevolution.com/marginalrevolution/2008/07/a-girl-named-fl.html

JeffJo March 3, 2012 at 12:06 pm

No, Alex gave a correct solution to the wrong problem there. To see why, let’s tweak the original, non-Florida version of problem a bit. BTW, this is based on a cautionary tale about the incorrect way to solve conditional probability problems like this, first published in 1889. Joseph Bertrand’s warning is usually ignored for the Two Child Problem when the answer “1/3″ is desired, but utilized by the same people – such as Leonard Mlodinow – for the Monty Hall problem when they want to dismiss the answer “1/2.” Yet the two problems are identical with respect to this logic, differing only in the numbers used.

Suppose you recall that a family has two children. What are the chances the children share the same gender? Yes, this one is meant to be easy – it’s 1/2.

But suppose you later remember that the family includes a girl. Do the chances change? It is tempting to say they do. Of the four possible combinations, three include a girl, and only one of those includes two. So it seems the answer should be 1/3. But if that is true, if you had later recalled a boy instead of a girl, the chances would also change to 1/3. And if it changes to 1/3 no matter what you recall about one child, the Law of Total Probability says the answer to the question in the last paragraph is also 1/3. So we seem to have a paradox (and it is known worldwide as Bertrand’s Box Paradox).

The resolution of the paradox is that, when the family has a boy and a girl, you could recall either gender. So you can’t count all of the families that *have* a girl as ones where you *recall* that one is a girl. You have to multiply the count by the probability you would recall a girl. And since we have no reason to assume girls are easier to recall than boys, that probability is 1/2. So the answer to the second question is not GG/(GG+BG+GB)=1/3, it is GG/(GG+(BG+GB)/2)=1/(1+2/2)=1/2. And the paradox goes away.

If you add the Florida bit, the logic is the same. There are two children, and one is not a girl named Florida. You had an equal chance to recall her, and that makes the answer exactly 1/2.

The same reasoning applies to the Monty Hall problem. When you would lose by switching, that means Monty Hall had the choice of two doors to open. When you would win, he had no choice. Since he only opened one, you must multiply the original probability the car is behind each door by the probability Monty woudol openthe door he did. So the answer is not P(switch&win)/(P(switch&win)+P(switch&lose)) = (1/3)/(1/3+1/3)=1/2, it is P(switch&win)/(P(switch&win)+P(switch&lose)/2) = (1/3)/(1/3+1/6)=2/3.

Brian C February 29, 2012 at 9:29 pm

But what if you ask instead whether he has a daughter born on Tuesday, and he says yes? :) What is the probability the father has two daughters? (This is related to the Girl Named Florida puzzle.)

Also, in parallel to your question above, suppose you walk up to the house next door and a daughter born on Tuesday answers the door. What is the probability the father has two daughters?

Assuming above that all families have two children and that the probability of any child being born a girl is 0.5 and that you randomly selected the father and the house.

MikeP February 29, 2012 at 10:25 pm

13/27. Very good puzzle!

And 1/2, of course.

Dan February 29, 2012 at 2:29 pm

This is also why most people think that they are below average drivers. They compare themselves to the other cars on the road, instead of to other people. The people who drive the most tend to be more skilled drivers, so the average vehicle-mile (or vehicle-minute) is driven by someone who is a better driver than the average person.

It’s an elegant little argument, which unfortunately has a false conclusion since in fact most people think they’re above average drivers.

Rahul February 29, 2012 at 2:44 pm

….the importance of empiricism.

Dan Weber February 29, 2012 at 6:01 pm

This is also why most people think that they are below average drivers

They do? I thought most people thought they were above average.

Ah, here we go: http://en.wikipedia.org/wiki/Illusory_superiority#Driving_ability

Careless March 1, 2012 at 9:42 am

Didn’t make it all the way through a fairly short post, did you?

Rahul March 1, 2012 at 1:39 pm

:)

Dan Weber March 2, 2012 at 3:27 pm

Apparently not.

Jacob Felson March 1, 2012 at 1:00 am
Gunnar Tveiten March 1, 2012 at 1:15 am

Most people at the gym, will be more well-trained than you. Most of your Facebook-friends will have more Facebook-friends than you do. Most people finding your Geocache, will have a higher find-count than you do.

All of these will be true even if you are, in all of these cases, an average gym-goer, average Facebook-user and average geocacher.

Statistics is hard.

Stephen March 1, 2012 at 3:41 am

This also means that people’s perceptions of the number of criminals in society are probably overstated. It is more likely that a random person will have been the victim of a crime than is the likelihood that any random person has victimized someone.

Stephen March 1, 2012 at 3:51 am

There are other examples that make the probability lesson extremely easy to understand: The average number of youtube videos watched by any random person far exceeds the average number of youtube videos any random individual has made. The random youtube video maker will watch youtube videos that have an average number of views that far exceed the average of his/her own.

GiT March 1, 2012 at 10:22 pm

Comments on this entry are closed.

Previous post:

Next post: