A Girl Named Florida

I’ve been reading Leonard Mlodinow’s The Drunkard’s Walk: How Randomness Rules our Lives.  The book covers the Monty Hall problem, Bayes’s Theorem, availability bias, the illusion of control and so forth.  If these are unfamiliar, look no further for an entertaining account.

On the other hand, I can’t say that I learned much I didn’t already know.  Nevertheless, I still enjoyed reading the book – it’s well written and filled with interesting nuggets (Did you know that the great mathematician Paul Erdos refused to believe that you should switch doors?).  If you teach probability theory or intro stats you will find lots of good examples to brighten up your lectures. 

One problem did intrigue me.  Suppose that a family has two children.  What is the probability that both are girls?  Ok, easy.  Probability of a girl is one half, probabilities are independent thus probability of two girls is 1/2*1/2=1/4.

Now what is the probability of having two girls if at least one of the children is a girl?  A little bit harder.  Temptation is to say that if one is a girl the probability of the other being a girl is 1/2 so the answer is 1/2.  That’s wrong because you are not told which of the two children is a girl and that makes a difference.  Better approach is to note that without any additional information there are four possibilities of equal likelihood for the sex of two children (B,B), (G,B), (B,G), (G,G).  If we know that at least one is a girl we can remove (B,B) so three equally likely possibilities, (G,B), (B,G), (G,G), remain and of these 1 has two girls so the answer is 1/3.

Ok, now here is the stumper.  What is the probability of a family having two girls if one of the children is a girl named Florida?

At first it seems impossible that knowing the name should make a difference.  Surely, the answer is 1/3 just as before?  After all, every child has a name.  But knowing the name does make a difference.  Here’s a hint, Florida is a rare name.

Comments

Is it because Florida is a rare name, the girl named Florida is more likely to be the second girl? This makes (G,G) more likely than (B,G) or (G,B).

I hope I am not embarrassing myself with this argument.

Half of one of course.

But I'm going to name both of my girls Florida just to spite him!

actually i'm sure theres something missing there...heh =)

What if the family's last name is 'Florida'?

P(G=2/G=florida) = P(G=FLORIDA/G=2)/P(G=florida). I guess the probability is pretty big if P(G=florida) id small.

Does her dad have a scar on his cheek and an evil eye?

My colleagues and I have spent lots of time arguing over this question as one of us insists on using it in interviews. I claim the problem is ill-defined, i.e. is not a yet a well-posed mathematical problem, because the even though the English sounds correct, the "preparation procedure" for the situation is ambiguous.

For example, is "Florida" a constant or a random variable? If I searched for a family with a girl of this name, making "Florida" a constant, then the answer is 1/2. If I took any family such that there is at least one girl and asked for a name and reported the answer "Florida" (making "Florida" a random variable), then this name provides no new information (after all every child DOES have a name) and the answer is 1/3.

Rarity of name plays no role here. We may as well assume all names are unique, for example perhaps names are DNA sequences.

In fact, one can extend this reasoning to the "easy" part of the problem, the first part. How did I choose this family before presenting the problem? Perhaps I choose the family to have EXACTLY one girl or EXACTLY two girls. Then the probabilities are 0 and 1 respectively. The reason that Part 1 seems unambiguous is that there seems to be a "canonical" preparation procedure that everyone assumes, i.e. the family is uniformly randomly chosen from all families having two children and at least one girl.

My colleagues and I have spent lots of time arguing over this question as one of us insists on using it in interviews. I claim the problem is ill-defined, i.e. is not a yet a well-posed mathematical problem, because the even though the English sounds correct, the "preparation procedure" for the situation is ambiguous.

For example, is "Florida" a constant or a random variable? If I searched for a family with a girl of this name, making "Florida" a constant, then the answer is 1/2. If I took any family such that there is at least one girl and asked for a name and reported the answer "Florida" (making "Florida" a random variable), then this name provides no new information (after all every child DOES have a name) and the answer is 1/3.

Rarity of name plays no role here. We may as well assume all names are unique, for example perhaps names are DNA sequences.

In fact, one can extend this reasoning to the "easy" part of the problem, the first part. How did I choose this family before presenting the problem? Perhaps I choose the family to have EXACTLY one girl or EXACTLY two girls. Then the probabilities are 0 and 1 respectively. The reason that Part 1 seems unambiguous is that there seems to be a "canonical" preparation procedure that everyone assumes, i.e. the family is uniformly randomly chosen from all families having two children and at least one girl.

Although I am just an organic chemist, and we never use statistics or probability (thank God) I agree with AntiAntiCamper. Everyone is rare along some axis, so what is true is the Florida girl is true of any sibling for a slightly different wording of the problem. In other words, as AntiAntiCamper points out, since rare is not numerically defined, one can use DNA data to arrive at the limit of uniqueness (or close to it, say 1 out of 80 billion). Since every human who is not an identical twin is genetically unique, we can make a "Florida" like statement about any sibling.

I have a comment/question for those here who are more versed in probability than I am.

I claim that it does not matter if you know or do not know which child (older/younger) was revealed to you UNLESS which child is to be revealed is part of the conditions of the test. It does you no good to be told afterward which child was revealed to you. Here is my reasoning:

You have a family with "at least one girl." As we have seen above, this means that there is a 1/3 possibility of the other child being a girl. The reason for this is that only one of the four cases can be absolutely rejected (BB). The other three remain and are all equally likely. This appears to me to be totally sound.

However, if we are now told "the older child is a girl," the claim is that the probability changes to 1/2 that the other child is also a girl because we can now also reject BG, leaving two equally likely possibilities.

But if this claim is true, then I believe that the following line of reasoning is valid:

At least one of the children is a girl and therefore there must be a girl child that is the older or younger child. We have no information about which child she may be, so she is 50% likely to be the older child and 50% likely to be the younger child. If the above claim is true, then if we know that she is the older child then the younger child is a 50% shot to be another girl and if we know she is the younger child than the older child is a 50% shot to be another girl. Since 50% * 50% + 50% * 50% = 50%, I claim that even before we know whether the child is the older or younger child, the probability is 50% that the unknown child is a girl. But this is a contradiction, since it has been established that the probability before we know which child it is is only 1/3. Therefore, I claim that it does no good to know which child is the girl, UNLESS that requirement is stipulated as part of the test.

In other words, if you consider all cases where the older child is a girl, half are immediately rejected because the older child is a boy and the probability is 1/2 that the other child is a girl. However, if either the older or younger child or both could be a girl and you are simply told which it is after knowing that the family has "at least one girl," meaning that you could be told either "older" or "younger," you have not actually acquired any new information and the probability remains 1/3.

I feel that I am wrong about this. I feel that this is likely because I am moving from "at least one child" to positively selecting one of the children to be the "at least one child," and that this is probably an error. Nonetheless, I would like feedback. I'm not scared of math, so if you need to use it, by all means, do so.

If a family already has a girl named Florida, why would they also name their other girl Florida? You'd think this is a trivial point but it completely affects the general case of Mlodinow's solution, which he gave in a comment in the WSJ link above.

He writes,

"According to the rules of conditional probability the chances a family has two girls if it has a girl named Florida are thus:

(Total Probability of GF-GN, GN-GF, and GF-GF) / (Total Probability of all 5 of the above events) =
= .25x*x + 2*.25x*(1-x) / [.25x*x + 2*.25x*(1-x) +2*.25x]
= [2x-x*x] / [4x – x*x]"

x is the probability that a girl would be named Florida. So if x is very small, then the solution approaches 1/2. However, as x increases, the solution decreases.

But what if the question was,

"What is the probability of a family having two girls if one of the girls is named Jen?"

Assuming that 5% of girls are named Jen, the answer would be 49.4%. Close to 50%, but not a trivial difference.

If 10% of girls are named Jen, then the answer would be 48.7%. But this assumes, per Mlodinow's solution, that 1% of all families with two girls would name both of their girls Jen!

So Mlodinow's solution doesn't really work for cases like "girls named Florida" where names aren't given out randomly. But it still works if "girls named Florida" is replaced with a trait that is truly random, even if the solution is somewhat counterintuitive.

*And yes, regardless of whether a family would name both of their daughters Florida the answer is still approximately 1/2, but the way Mlodinow arrives at his solution assumes names are given out randomly.

One half. Because families with two girls have two chances to name one "Florida" while those with only one girl have only one chance. It doesn't quite work out because there is some chance that both willbe named "Florida" (in a strict probability sense) but that's why it's important that the name is rare, so th chance of that is negligible.

I have Bruce Schechter's book on Erdos, and he says that Erdos was convinced a few days later by Ron Graham at Bell Labs.

Yes, I agree that this question is ill-posed. At the WSJ, they phrased it like this:

You know that a certain family has two children, and you remember that at least one is a girl with a very unusual name (that, say, one in a million females share), but you can’t recall whether both children are girls. What is the probability that the family has two girls — to the nearest percentage point?

But as others have suggested, we can rephrase this it thusly:

You know that a certain family has two children, and you remember that at least one is a girl with a combination of genes (that, say, one in a million females share), but you can’t recall whether both children are girls. What is the probability that the family has two girls — to the nearest percentage point?

This is now true of anyone, since you can always just use any arbitrarily large gene set (or sequence) till you come up with one that is sufficiently unique.

John Lynch,

You're double counting. Consider the three cases with one girl: GG, GB, BG. Assume the older child is a girl: then there is a 50% chance that the younger is a girl. Assume the younger child is a girl: again, there is a 50% change the older is a girl.

But you cannot then add .5*.5 + .5*.5. Why? You need to subtract out the intersection. You're adding up "older child a girl times younger child a girl" to "younger child a girl and older child a girl" which is the same thing.

Mathematically, if one kid must be a girl, where YG is younger girl and OG is older girl: P(G)=P(YG)P(YG|OG)+P(OG)P(OG|YG)-*P(OG and YG) = 2/3*1/2+2/3*1/2+1/3*1/2-1/3 = 1/3.

Following up on my earlier comment/question:

From AntiCamperCamper:
"For example, is "Florida" a constant or a random variable? If I searched for a family with a girl of this name, making "Florida" a constant, then the answer is 1/2. If I took any family such that there is at least one girl and asked for a name and reported the answer "Florida" (making "Florida" a random variable), then this name provides no new information (after all every child DOES have a name) and the answer is 1/3."

This makes me feel that my earlier hypothesis is correct, but ACC knows the terminology better than I do. In my early problem, if which child the girl is is a constant, then the probability of the other child being a girl is 1/2, but if it is a random variable then it does not affect the outcome because each child has to be either older or younger.

Think if it as a game where two coins are each hidden under a cup. At the start, you know nothing, so the probability of HH under the cups is 1/4. If the person running the game looks under the cups and then tells you that at least on of them is H, then the probability of HH is now 1/3. At this point, if you tell the person running the "game" to reveal an H to you and he agrees, it cannot be the case that the probability that the unrevealed coin is H is now 1/2. This is because one coin HAD to be revealed, and it HAD to be an H, and an H HAD to be hidden, and it was equally likely which coin was going to be revealed (assuming that in the HH case, the revealed coin is chosen randomly). If the probability changed to 1/2, then by my reasoning above, it must also have been 1/2 prior to the revelation (since each possible revelation was equally likely).

This is massively different from the case where the person running the game must reveal the coin under a specific, predetermined cup even after telling you that at least coin is an H. In this case, if it turns out to be an H, then the probability of the other coin being an H is 1/2. If it is a T, then the probability of the other being an H is exactly 1, since at least one must be an H. When we talk about the probabilities changing when we are told that the older child is a girl, this is only true if the older child HAD to be revealed to us, boy or girl.

In any case, I now feel that my original hypothesis is right. Feedback still welcome, of course.

The skeptics are off-base, the question is fine as is.

The relevant assumptions are these, there is nothing unusual about them:

1) 50% of all births in two-child families are male, 50% female.
2) For any two distinct births X,Y:
Gender[X] is conditionally independent of Gender[Y];
Gender[X] is conditionally independent of Name[Y];
Name[X] is conditionally independent of Name[Y];
3) P(Name[Y]=Florida | Gender[Y]=Female) = 0.001 or something else small.

Now here's the proof:

There are nine possible events in a two-child family.
The probabilities below follow directly from the premises (1), (2) and (3).

P(Female-Florida, Female-Florida) = 0.5*0.001 * 0.5*0.001
P(Female-Florida, Female-NonFlorida) = 0.5*0.001 * 0.5*0.999
P(Female-Florida, Male) = 0.5*0.001 * 0.5

P(Female-NonFlorida, Female-Florida) = 0.5*0.999 * 0.5*0.001
P(Female-NonFlorida, Female-NonFlorida) = 0.5*0.999 * 0.5*0.999
P(Female-NonFlorida, Male) = 0.5*0.999 * 0.5

P(Male, Female-Florida) = 0.5 * 0.5*0.001
P(Male, Female-NonFlorida) = 0.5 * 0.5*0.999
P(Male, Male) = 0.5 * 0.5

Work it out: find the aggregate probability mass of:

a) All events that contain two females at least one of whom is named Florida,
b) All events that contain a female named Florida.

You will find that (a)/(b) is about equal to 0.5. QED

It is hard to tell now that I know what is actually meant, however I was originally taken by the "one child is x, what is the probability of the other being x" question. The way you put it ("at least one child is x") makes it clear that you are picking among them. However, as I first encountered the problem, it was just noted the status of one child, not letting on that they had both children to search to find said "x".

I think an intuitive way to see it is that in the three allowed cases (G,B), (B,G), (G,G) knowing that one of the girl is name Florida reduces the probability of (G,B) and (B,G) almost twice as much as the probability of (G,G) because the possibility of finding a Florida in one of the spots of (G,G) is almost twice as much as in the other pairs.

If you are picking people randomly to find Floridas, you are almost twice as likely to find her in a (G,G) pairs than in either a (G,B) or (B,G) pair. Also from a random population with an equal number of (G,B),(B,G),(G,G) when in the process of filtering out all pairs not containing a Florida, you will keep a (G,G) pair almost twice as often as the other kinds of pairs as you have two chances of falling on a Florida in the (G,G) pair.

if the frequencies of Floridas is 1/1000
f:named Florida
nf: not name florida

Knowing a (G,B) or (B,G) pair:
P(Gf, B)=1/1000 * 1
the probability of a Florida is 1/1000

P(Gnf,B)=999/1000 * 1
the probability of no Florida is 999/1000

Knowing a (G,G) pair:
(Gf,Gnf) = 1/1000*999/1000 ~1/1000
(Gnf,Gf) =999/1000*1/1000 ~1/1000
(Gf,Gf)=1/1000*1/1000 ~ 0
the probability of a Florida is ~ 2/1000

(Gnf,Gnf)=999/1000*999/1000 ~ 998/1000
the probability of no Florida is ~ 998/1000

2/1000 is twice 1/1000

This throws the entire answer to the first question (1/3) into doubt for me. Why? Well, let's say we have our family with at least one girl. That girl has to have a name, yes? Let's call that name "N" and any others "X". So our choices are: (GN, GX), (GX, GN), (GN, B), (B, GN), and (GN, GN). It is very rare for someone to name both their daughters the same name! Therefore the answer to the original question is ~50%.

Aren't you all making this more complex than it needs to be?
There are three cases

1. you know that one of the children is a girl. So the possibilities are;
GB BG GG.
As Tyler said, only 1 of 3 gives us two girls

2. you know the first (or second) is a girl. So the possibilities are;
GB GG
So with more information than the first instance, 1 of 2 gives us two girls

3. you know one of the girls is unique (has the name florida). Note; we ignore the case of both children having the same name or one being transgendered or a hermaphrodite etc.

So the possibilities are

BG(named Florida) G(named Florida)B G(named Florida)G GG(named Florida)

Which gives us 2 of 4 or 1 of 2 chances that both are girls

I have been pleasantly surprised how cordial and respectful everyone has been toward one another. That was until Radford Neal implied this is actually "endless unprofitable debate with people who refuse to think".

OK. I admit I was a bit too premptively ornery. The underlying truth here, though, is that some people do get locked into one, incorrect, way of thinking, and it's very hard to dislodge them from this.

There are wider implications to these puzzles. In AI, some people spent decades debating questions like "You're told Twiggy is a bird, do you assume that twiggy can fly?", but "Now you're told that Twiggy is a Penguin, do you still think that twiggy can fly?", etc. These all are isolated from any context that would allow one to assign probabilities. That may be a problem with the "Florida" puzzle, though I think the intended interpretation is fairly obvioous. In my puzzle, the context is established by narrative to avoid such problems.

Is the original question affected by the fact the it is assumed (by the 1/3 proponents) that the questioner has complete knowledge and a possible fixed answer in mind ("I will find a girl")?
If the situation were such that there were two rooms containing either a boy or a girl. The questioner instead of looking at both, only looks in one and says "one child is x" or "at least one child is x". Is this not a reasonable situation for a person to assume? In this case, I don't see how one can affect the other, just as one coin toss does not affect the next.
On the other hand, if the questioner was looking for an "at least one girl" family, then by default you are ruling out 1/4 of chances, not by probability, but by choice.
If the questioner was not looking for a certain "x", but did look into both rooms and chooses a random child to describe, does that change the answer from random to "unbalanced"?
Part of the reason I question it is that the "Monty Hall" problem works because there is total knowledge which affects which door is removed. If "Monty" removed a random other door, possibly making the prize unattainable, then it would make no difference to change.

I am not sure we should be treating the events as independent. We are assuming, for example, that BG and GB are independent possibilities. Since they will be "selected" simultaneously for observation, then the two possibilities are not independent.

If I introduce you to two of my children, and they are both boys, then the possibility of an unrelated child being a boy is 50%. The two possibilities are simply BBG and BBB.

If the third child happens to be related, it does not expand the possibilities to include GBB and BGB. The age of the children was never discussed. The only two independent possibilities are:
1. Two boys and a girl. Or
2. Three boys.

The order is not relevant.

Holy crap, I've just skimmed most of the comments, but I think you guys are over thinking this. Work through it just like the first problem.

Here are all of the possibilities for two children if we allow three "types" (B-boy, F-girl named Florida, G-girl not named Florida):

BB, BF, BG
GB, GF, GG
FB, FF, FG

The problem says we need at least one F, which leaves us with:

BF, GF, FB, FG, FF

We need to make one assumption: that Florida is a rare name, and the probability of two Floridas in one family is nearly zero, so now we have four equally likely possibilities:

BF, GF, FB, FG

Two of these are 2 girls, so the probability is 2/4 or 1/2.

It may be more accurate to say the probability is ever so slightly greater than 1/2... and assuming nothing about the probability of the name Florida, we can say the probability of two girls lies somewhere between 1/2 and 3/5.

The difference here is that pinning down one of the names changes the GG case, in that the order of the two girls now matters. In the general case, a girl is a girl, it doesn't matter how they're ordered, we can say it's symmetrical. Stephen, in your case where you say "whatever her name is, call it N" you still have a symmetrical case, because you can say that of both girls, so the GG case just becomes NN and there's no change (p=1/3). The probability only changes if you single out ONE of the girls as somehow special.

Just because "Florida" is a rare name doesn't mean it's rare for a particular family. If dad says, "If I have a daughter I want to name her after my grandmother Florie," and mom is down with that, Then P(Florida|Girl) = 1. If Florida is a special name, and not a flippant choice, it's much more likely that the first-born daughter will take the name.

So I don't think it's the rarity of the name that matters, but knowing anything identifying about the girl (Florida, the older one, the blond one, etc.) changes the odds to 1/2, as RobbL put so succinctly.

David,
I think the rarity helps in that it distinguishes between what is likely to be shared between any siblings (or within sexes). In other words, if it is likely that any girls will share the same characteristic (both parents are blond, therefore you would expect most or all of the children to have blond hair), then it doesn't tell you anything about how the questioner made his/her distinguishing observation.

Mike Sackton has the best explanation of the fundamental difference between the two questions (with no messy probability calculations).

Tac-Tics, your second paragraph there is wrong. The thing about the (B,B), (G,B), (B,G), (G,G) approach is that each of those pairs is equally likely. If you take the set approach that (B,G) = (G,B), then sure you have three states, but that one is twice as likely as the other two, and the numbers still work out to 1/3.

But I have to admit the Florida version of the question gives me headaches. I understand the logic of the nine states approach giving you an answer of 1/2; but I have a hard time convincing myself that approach is the correct one to take...

It's easy to see that if the older child is a girl then the probability of both girls is 1/2, ie, the probability that the second child is a girl. More generally, if some specific child is a girl -- the older child, or in this case the one with the memorable name -- then the probability that the other child is a girl is 1/2.

This also works if you meet the parents pushing one of their kids in a stroller and the kid is a girl. You've learned slightly more than "at least one is a girl". You've learned that one specific, essentially randomly chosen child is a girl. So the probability that the other one is a girl is 1/2.

It's like putting the kids in an urn and pulling one out and seeing it's a girl.

I don't think stating the order of birth necessarily helps you, as it could be post hoc information retrieval. If you had assurances beforehand that the questioner was going to report the sex of the first born, then that helps you. If you don't know whether they are going to report either the first or second born, even if they tell you after the fact, it doesn't help you.

This is very similar to another well-known problem, which perhaps makes it easier to understand:
I have a deck of cards, shuffled randomly (all permutations equally likely). I deal out the top two cards, face down.
a) What is the probability both are red?
b) I look at the cards, and truthfully tell you one is red. What is the probability both are red?
c) The first card I dealt, I flip over, and you see it is red. What is the probability both are red?
d) I look at the cards, and truthfully tell you one is the ace of hearts. What is the probability both are red?

Here we can use approximate probabilities, so we can just say that the answer to (a) is 1/4, not (26*25)/(52*51).

This avoids all the confusion about "choosing girls vs choosing families", etc. The key point, of course, is whether (d) is like (c) or like (b)!

anobdemus,

Any unambiguously feminine name, even one as common as "Jen," is enough to uniquely identify one child and change the problem. Requiring P(2 Jens) > 0 is nonsensical.

In the first case, the information you have is that there are two children. You don't know there's even one girl. P = 1/4.

In the second case you know there is one girl, eliminating B-B. P = 1/3.

In the third case you learn there is one girl and she has a name. Names are unique within a family and so you have Florida-B, Florida-G. P = 1/2.

I think in the second case you can and should make the assumption that the girl has a name even if you don't know it, and therefore the second case always degenerates to the third case.

Ah, I see my mistake. When you're guaranteed to have at least one girl as one of your two children, the events are no longer independent of each other (since having a boy first determines the sex of your second child).

Carry on.

In my universe there are 3 families. I'm going to tell you my labels for them but I won't tell you their REAL names. However I WILL tell you that that they all have unique names. M = male, F = female.

Family 1: M1 F1

Family 2: M2 F2

Family 3: F3 F4

I've uniformly randomly chosen a family and put them in Room A. We all agree that there is at least one girl in Room A and the probability that there are two girls in Room A is 1/3, right?

OK, I've just spoken with a girl in Room A and her name is Hawaii.

What is the probability NOW that there are two girls in the room?

I hope "mk" and Benoit are still lurking about. They gave the best challenges to my early post.

Observations:

1. If all names are equally rare, then discovering a name cannot change the probabilities as it is clear we get no new information.

2. These probability puzzlers ultimately must be solved by using Bayes Theorem. Informal probabilistic reasoning is simply not sufficient for these subtleties.

3. The Bayes Theorem solution is itself extremely subtle in this case.

4. This is a really great problem to think about.

I think MikeP has it.

When you can't tell the girls apart, you could double-count them.

Jordan and Hawaii and you happen to know Jordan is a girl, but you don't know which one she is.

Jordan and Hawaii and you happen to know Hawaii is a girl, but you don't know which one she is.

It's one possibility whichever one of them you know about. Two chances that the one you don't know about is a boy, one chance the one you don't know about is a girl.

But if you can tell them apart then there are only two choices for the one you don't know about.

The statistics turns weirder for the cases where you can't tell things apart. We assume we can't tell electrons apart or protons apart etc. And we get weird statistics concerning elementary particles. Coincidence? You decide.

That is to say, you need to make the problem about individuals (microstates), rather than members of a group (macrostates).

Rarity does play a role. ...assuming that a family would not name both girls Florida.

These statements are not both correct. Rarity does not necessarily play a role if the appearances of the trait are not independent. If, as you explicitly mention, a family does not give both siblings the same name, then the rareness of the name is utterly immaterial.

In particular, even if all girls were named Florida except those with a sister named Florida, then the probability is still 1/2.

Intuitively, rarity of the name is a proxy for uniqueness of the trait among the siblings. But if the trait is already unique -- e.g., a name -- then rareness does not matter.

Let's go back and rephrase the exact same problem a different way.

A trickster has a cabinet with three drawers. In each drawer he has two coins. In two of the drawers he has one gold coin and one silver coin. In the other drawer he has two gold coins.

He picks a drawer at random and shows you one coin. It turns out to be a gold coin. What's the chance the other coin is also gold?

That chance is 1/2. There was a 2/3 chance that he picked a drawer that had a silver coin in it, in the first place. If he did that, there was a 1/2 chance he'd show you the silver coin and then you'd know for sure that it wasn't the drawer with two gold coins. But that didn't happen. So of the four remaining possibilities, two of them are with the drawer with two gold coins.

Now suppose that the trickster picks a drawer at random, and then shows you a gold coin from it. Now the chance is 1/3 that the other coin is gold. It was a 1/3 chance to begin with, and there's always a gold coin for him to show you so that doesn't change the odds at all.

If the father could be talking about either child, then there are two different ways for the other to be a boy. G1B G2B G1G2. If the father is definitely talking about one of them then there's only one way for the other to be a boy. G1B.

Similarly with for example the double-slit experiment from physics. If you know which slit the photon or the electron went through, then the result is very different from the case in which you don't know.

That should be: P(A|B,C) = P(A|B) = 1/3.

MikeP,

In your example, why would P(Z) not equal P(Y,T), the joint probability of at least one girl and the likelihood of Florida?

If P(Z) = P(Y,T), then P(Z) = P(T)P(Y|T).

And since the probability of at least one girl is completely independent of the popularity of the name "Florida," P(Y|T) = P(Y).

I think P(Z|X) ~ 2P(T) is fallacious, because you are changing X's meaning from, "chance of family of 2 girls," to "looked at any 2 girls."

If you try to use Bayes by saying, "If I look at enough N i'll find R," or P(R|N) = P(N|R)P(R)/P(N), you find that P(N) has no logical meaning. It's 1. So P(R|N) = P(R).

I think you have to consider Z shorthand for both Y and T, and determine P(X|Y,T), which remains 1/3.

In your example, why would P(Z) not equal P(Y,T), the joint probability of at least one girl and the likelihood of Florida?

Y contains the following possibilities:

G B
G G

Z contains the following possibilities:

GT B
GT G
GT GT

I don't think there is any way you can distribute T across Y to get Z.

In particular, the second and third cases of Z are vastly different and are completely uncaptured by saying that P(Z)=P(Y)P(T). And it is exactly the second case that contributes all the weight of probability that turns the 1/3 of the generalized girl into the 1/2 of the specialized girl, while the problem is interesting only because P(GT GT) is insignificant or zero.

I think P(Z|X) ~ 2P(T) is fallacious, because you are changing X's meaning from, "chance of family of 2 girls," to "looked at any 2 girls."

I don't agree. The probability that at least one girl has trait T given there are two girls is the ambient probability one has the trait plus the ambient probability the other has the trait minus the ambient (or forced) probability both have the trait.

Mike P Says:

You could do this with any trait, so long as the negation of the trait is not correlated with sex.

And then gives some examples of traits.

I do think that "any" trait is sufficient. I believe that the trait needs to allow one to specify what happened in the original random event. I.e., two separate children were born, does the trait allow us to specify which of these two children was female?

Knowing that the oldest was female does allow us to specify what happened in the original random event.

Knowing that the right handed baby is female, does not, however allow us to specify. Even if we know that only one baby is right handed then we still can't make inferences about what happened in the original sample space. The situation would be different however if we knew in advance that there was going to be a one right handed and one left handed baby. But this is not the case.

When we know in advance that there is one child called Florida, and she is female, then we can specify which baby was female and the probability that the other is female is 1/2

Even if we know that only one baby is right handed then we still can't make inferences about what happened in the original sample space.

Exactly. And since we cannot make inferences about the other sibling from this meager sex-independent information, we are stuck with the a priori probability that a child is a girl: 1/2. It is exactly the lack of information conveyed by calling out traits of one child that are not possessed by the other that prevents us from lumping cases together and complicating the probability the other is a girl.

Put another way, I can, with probability 1, tell you something about one of my children that is not true of the other child. If the negation of that something has nothing to do with sex, I am giving you absolutely no information about the sex of my other child. The probability it is a girl remains the a priori 1/2.

However, if we meet a girl on the street who tells us that her name is Florida, and that she has one sibling, the probability that her sibling is female is 1/3

Actually, this is the very simplest case. You meet a girl. She tells you she has a sibling. She can tell you anything and everything about herself and, so long as she does not implicate her sibling in doing so, the probability that her sibling is a girl is 1/2.

Nope.

It is true that only a third of all two-child families with girls in them have two girls in them. But two-girl families have twice as many girls to contribute to walking down streets. So half the girls from two-child families you'll meet walking down the street come from families with two girls in them.

See my Jul 09, 2008 at 08:57 PM comment upthread for how the problem statement decides for us whether we are counting families or counting girls.

Your distinction between counting girls and counting families is not correct.

I toss two coins. I show you one coin and it is a head. The probability that the other coin is a head is 1/3. Try this at home.

A couple has two kids. You meet one child and she is female. The probability that the other is female is 1/3

P(HH) = 1/4. P(HH/H) = 1/3. Notice that the coin you looked at is a 1943 steel penny, but that doesn't affect it's fraction of coming up heads.

You are using P(A|B) = P(B|A)P(A)/P(B) and creating a hybrid B = HeadsIsSteel or GirlIsFlorida. Your P(B|A) defines A as "N=2", but your P(A) and P(A|B) define A as TwoHeads.

I'd like to hear your explanation why you shouldn't use

P(A|B,C) = P(C|A,B)P(A|B)P(B)/P(B|C)P(C)

Where B,C is the joint probability of the first coin you look at being both heads (B) and steel (C).

Since P(Steel|HH,H) = P(Steel) and P(H|Steel) = P(H), P(HH|H,Steel) = P(HH|H). The first coin being steel does not tell you about both coins being heads. You are being tricked into throwing out your prior: P(HH|HIsSteel) -> P(H), or P(GG|GIsFlorida) -> P(G).

It totally violates my intuition, but the answer is 1/2 (in the limit of Florida being a very unlikely name). I think the simplest way to see this is by making a list of mutually exhaustive and exclusive possibilities, assigning probabilities to each.

To start with I'll assume that names are assigned independently, so that it is possible for siblings to have the same name.

I'm defining the probability to be: "Probability that a family with two children has two girls, given that they have at least one girl, and her name is Florida" (*)

Here are the possibilities (note these are mutually exclusive and exhaustive, and you can check that the probabilities sum to 1):

P(BB) = 1/4
P(BF) = 1/4 T
P(BN) = 1/4 (1-T)
P(FB) 1/4 T
P(NB) = 1/4 (1-T)
P(FN) = 1/4 T (1-T)
P(NF) = 1/4 (1-T) T
P(NN) = 1/4 (1-T)^2
P(FF) = 1/4 T^2

(BB means two boys, BF means first child is a boy, second a girl name Florida, FF means two girls both named Florida, etc. T is the probability that a girl is named Florida.)

Our desired probability is ...

P(FN + NF +FF) / P(FN + NF + FF + BF + FB) =
(1/4) (T(1-T) + (1-T)T + T^2) / (1/4) (T(1-T) + (1-T)T + T^2 + T + T) =
(1/4) (2T-T^2) / (1/4) (4T-T^2)

So the answer here is (2T-T^2)/(4T-T^2). Note that if *all* girls are named Florida, this reduces to 1/3, so we have the correct limit. For T<<1, we get the 1/2 answer.

Finally, In the comments to the WSJ article linked above, Leonard Mlodinow gives the same calculation. He explains why my intuition is wrong:

"if you know the family has a child with a rare girl’s name, then you might suspect that they have two girls because that doubles the odds that one of their children will have the rare name."

In other words, he's working with an assumption in which all parents pull names from the same distribution. This assumption is, of course, debatable, which is why bayesians view *all* probabilities as conditional and think it's important to be clear about priors.

David,

Everyone is assuming that we know they only have two children.

By you P(B) you must mean "parents have name daughter number 1 Florida". As you state, this is independent of how many daughters they have. But Mlodinow and I are saying that the appropriate P(B) for this problem is "parents have named one of their daughters Florida" ... and clearly, the more daughters you have, the more likely it is that one of them is named Florida!

If the problem stated "the first child is a girl", then the answer would be 1/2 regardless of whether or not we know that this first girl is named Florida.

weichi, the only way your 2nd table adds up to 1 is if T = T^2.

weichi,

Thanks for laying out the exhaustive probabilities. When I tried I missed that P(FN) soaks up the P(FF) that is being excluded.

It does appear that it is not perfectly 1/2, and that therefore rareness matters in how close it is to 1/2.

But if I may offer a conjecture... If the ambient probability of a boy being named Florida is equal to the ambient probability of a girl being named Florida, and if no two siblings have the same name, then the answer to...

What is the probability of a family having two girls if one of the children is a girl named Florida?

...is exactly 1/2, regardless of how rare the name Florida is.

In other words, if the trait "is named Florida" is not correlated with gender, then it will offer not even the slightest information on gender, and the sibling's probability of being a girl is the a priori 1/2.

Mike P

1. "You meet a girl. You learn she has one sibling. What is the probability the sibling is a girl?"

2. "A couple has two kids. You meet one child and she is female. The probability that the other is female is 1/3"

Your reason for these cases being different requires the introduction of a probabilistic mechanism governing how you meet the kids. This requires additional assumptions about the probabilities concerned. You could also assume the child you meet is determined in a non-random, yet unknown fashion.

When you make your rather strong assumptions you are correct, but if you don't make it, the two cases are identical and each has probability 1/3.

"However, if we meet a girl on the street who tells us that her name is Florida, and that she has one sibling, the probability that her sibling is female is 1/3"

No. But if some person in a chatgroup whose handle is xlerb tells you that its parents were drugged-out hippies who liked the name Florida and sure enough they have at least one daughter named Florida out of two children, then the chance that the other is a girl is 3/5.

The independent choices are:

Older girl named Florida, younger boy.
Older boy, younger girl named Florida.
Older girl named Florida, younger girl.
Older girl, younger girl named Florida.
Older girl named Florida, younger girl named Florida.

:->

Seriously, this is an important problem. Not about the sisters, but the fact that none of us have an answer that's convincing and easy to understand.

It isn't enough to work the formulas any more than knowing how to work the formulas will solve you calculus word problems. Figuring out when to apply the different formulas is the point.

Some of us are confused about this. And the ones who aren't confused are incoherent to the point they can't explain it sensibly.

Let's try looking at it in terms of information. If it's a family with two children and you don't know any more than that, the chance of 2 girls is 1/4 and the chance of some other outcome is 3/4.

If you know that there's at least one girl then the chance of 2 girls is 1/3. You've lost information. If you predict repeated cases like this you'll be right less often.

If it's a family with one child then the chance of a girl is 1/2. You know even less. With only two choices you can't know less than this.

I don't know whether that's productive.

You start out with four choices. When you find out that there's a girl, you remove one of the four choices.

When you find out that a particular one of them is a girl, you remove two of the four choices.

How do you tell in general whether you're finding out there is one, versus finding out about a particular one?

I sort of think in general if you can say "This one is a girl" then you've reduced it to two choices, and if you can only say "At least one of them is a girl" then you still have three choices.

So if the father says "One of my children is a brunette girl named Stephanie who plays the viola and votes Libertarian and likes chili" then the chance the other is a girl too is 1/2.

But if the father says "At least one of my children is a brunette girl named Stephanie etc" then the chance that the other is also a girl is 1/3. That doesn't sound right because it seems like there's such a tiny chance they'd have the same name, haircolor, instrument, party, and food preference.

So, you're in his living room and you see a Unicorn Princess toy. You know there's a girl. The chance of 2 girls is 1/3 because all the toy shows you is it isn't 2 boys.

You see the toy and he tells you one child is 4 and the other is 14. The toy belongs to the 4-year-old girl and the chance of 2 girls is 1/2 because 14-year-old girls don't have Unicorn Princesses any more than 14-year-old boys or 4-year-old boys.

weichi,

"weichi, the only way your 2nd table adds up to 1 is if T = T^2."

Really? I just tried it again, and I get 1. Can you tell me how to change it to get it to add up to 1?

I'm sorry, your table is correct. (It was late...)

...Where did I go wrong? Is one of the probabilities in my table incorrect?

I'm really not sure. I'm also really not sure if maybe I'm the one who is wrong. But I beleive it is incorrect to combine unrelated traits in the conditional probability without using the joint probability of those traits, i.e. Girl WITH name Florida vs. Girl AND name Florida. It concerns me that when you use AND, and substitute that joint probability into Bayes, the unique trait cancels out and gives you 1/3.

I mean, you can keep adding traits to make T vanishingly small.
Girl WITH name Florida WITH pet snake WITH owns amulet WITH breeding a dwarf, etc. On the one hand you say the trait Florida is about the girl, but then you say "A family with 2 daughters is twice as likely to name a girl Florida." So is the trait really about the family?

I think something about the wording of the problem is convincing us to shift viewpoint from the family to the unnamed sibling.

The solutions we've seen to the Florida problem do seem to be biased by the fact that we presume that only girls are named Florida.

Following weichi's model, I enumerated all possibilities of children presuming that girls are named Florida with probability p and boys are named Florida with probability q.

The answer to...

What is the probability of a family having two girls if one of the children is a girl named Florida?

...is (2-p)/(4-p-q).

When q is zero (no boys are named Florida), then we see the solution offered by Mlodinow. But surely there is at least one boy somewhere named Florida, so this cannot be the whole story.

When q=p (the name Florida is uncorrelated with sex), then the solution indeed collapses to the conjectured 1/2.

Now here's the real wild result...

What is the probability of a family having two girls if one of the children is a girl named Michael?

It's greater than 1/2!

MikeP,

If your claim is true, then P is never 1/3 because P(girl has some unique trait, even if unobserved) = 1. No one is arguing you can't make the outcome 1/2 by changing A from "family has 2 girls" to "Florida's sibling is a sister." I'm not sure that change is justified.

David,

If your claim is true, then P is never 1/3 because P(girl has some unique trait, even if unobserved) = 1.

Nope. The reason is that your metastatement can be said of either sibling. You need to actually say something about the first sibling that cannot be true of the other sibling.

"I think something about the wording of the problem is convincing us to shift viewpoint from the family to the unnamed sibling."

Yes! I'm starting to see it.

When you don't know anything else about a child, the chance is about 1/2 that it's a girl.

When you do know something more, why should you throw away the extra information and settle for the more ignorant view?

We started with 2 children and the chance was 1/4 they were both girls.

Then we found out at least one of them was a girl. That reduced the knowledge we cared about -- the chance went from 1/4 to 1/3 they were both girls. If you know one of them is a girl, the chance the other is also a girl has gone from 1/2 to 1/3 -- you know *more* about that child than before, even while you now know less about the pair.

Are we sampling families or are we sampling girls? To find out, check whether you're sampling families or sampling girls. If a father tells you he has two children and at least one of them is a girl, it's 1/3 the other is too. Of all the fathers with 2 children, 3/4 of them will have at least one girl and 2/3 of those will have only one girl.

If a woman tells you she has one sibling, it's 1/2 her sibling is a girl. You aren't sampling families, you're sampling women. You could just as easily have met her sister as herself.

If a father tells you he has two children and one of them is a girl named Steppenwolf, it's 1/3 that the other is a girl. You haven't lost the information you got from knowing at least one of them -- you don't know which one -- is a girl.

If a father tells you he has two children and one of them is a girl named Steppenwolf who plays the accordian, shaves her head, and eats artichokes it's still 1/3 that the other is a girl. You don't need to throw away the information you get by knowing there are two children and at least one of them is a girl, just because you found out more about one who's a girl.

Here's a tricky case. A father tells you he has two children and the older one is a girl. Then the chance the younger one is a girl is 1/2.

Say he tells you he has two children and the younger one is a girl. Then the chance the older one is a girl is 1/2.

But if he didn't tell you which one was a girl the chance would be 1/3. Why the difference?

If both his children are girls, he'll be in both samples. Both the 1/2 that can tell you about the older daughter and the 1/2 that can tell you about the younger daughter. So he inflates the chance for both samples. If he wasn't saying whether it was older or younger, he'd answer once. When he says then he answers twice, once for each survey.

Look at the sampling. When each daughter gets sampled, count the daughters. When each family gets sampled, count the families. If you can't tell who's getting sampled then you can't make a good estimate.

If a father tells you he has two children and one of them is a girl named Steppenwolf, it's 1/3 that the other is a girl.

It's 1/2.

You haven't lost the information you got from knowing at least one of them -- you don't know which one -- is a girl.

Oh, but you do know which one: the one named Steppenwolf. You now have lost information about the other one because only one is named Steppenwolf. By giving you more information about this child, he took away anything he could tell you about the other child.

Oh, I do agree that the girl's name must come concurrently with the fact that there is at least one girl before the probability is 1/2. If the father says he has two children and at least one is a girl, then asking for a name does not change the probability because it provides no new information: He can give you such a name with probability 1.

But if you were told that he has at least one girl, and you were later introduced to a girl randomly, say by her walking in, then the probability the other is a girl does immediately become 1/2. You cannot ignore the possibility that a boy might have walked in instead.

... and you were later introduced to a girl randomly, say by her walking in, ...

Mike P your analyses always involves another random event. Chosing the probabilities for this other random event allows you to get either 1/3 or 1/2 for your sibling probability.

Try looking at the problem where the ONLY randomness is the joint sample space representing the birth of the two children. And then think about information that provides conditioning on this (and only this) sample same. You will understand the problem(s) better if you do this.

Mike P you're still not on the right track:

The key is what event(s) on the original sample space does the information that you are given allow you to resolve.

The connection to Monty Hall is this: In the Sibling problem most people try to treat the problem as a series of random events (e.g. birth, meeting a child etc). When there is only one random event: the original births. In the Monty Hall problem most people try to treat it a a single event, but rather it is sequence of random events ( i.e. your choice then monty's choice)

MikeP,

You just changed all the circumstances, if I read your first paragraph correctly. It seems you now agree that knowing the girl's name does not add information. You've been pounding the table that it does.

You also wrote "then the probability the other is a girl does immediately become 1/2." That's the whole point. We're not calculating the probability the other is a girl, we're calulating the probability of a 2 girl family.

The way I've been reading the problem all along is: you learn there's one girl, then you learn her name is Florida.

Have you been saying all along that it's really (AtLeastOneGirl AND OneGirlNamedFlorida) not necessarily the same girl? I really don't know how that affects things.

Your last example is instructive. If A = "girl who walks in is same girl father mentioned," and B = "father says at least one girl," then your answer for P(A|B) depends entirely on your treatment of P(B|A).

P(B) is 3/4, and P(A) is 5/6. If you beleive the posterior A adds information and alters P(B), you might choose P(B|A) = 1. If you beleive the girl who walks in is just a 2nd observation of "at least one girl," then P(B|A) = P(B) = 3/4.

The difference is, the latter gives you an 83% chance it's the same girl, while the former gives you a 111% chance, which is unphysical.

The way I've been reading the problem all along is: you learn there's one girl, then you learn her name is Florida.

Well, I've been reading it as: you learn that there is (at least) one girl whose name is Florida.

If A = "girl who walks in is same girl father mentioned,"

No. A should be "girl walks in". I don't want to start the Schrödinger's cat question of whether the man saying "at least one girl" with a specific girl in mind is information carrying to you.

But how on earth did you get your P(A) = 5/6?

You can furthermore make that a penny named Steppenwolf and a nickel named George.

You are welcome to treat the child's naming as a random event. To yield a 1/2 sibling probability you need, however, the probability of that name to go to zero, see Mlodinow's analysis.
I am not sure that we know enough about how the parents choose the name Steppenwolf to be sure that it was a random event.

In my coin tossing example the naming was clearly not random.

It is interesting that treating the name as deterministic or random yields different results. The issue of treating an unknown as a random unknown or as a deterministic unknown is treated in depth in Van Trees "Detection and Estimation Theory"

To yield a 1/2 sibling probability you need, however, the probability of that name to go to zero, see Mlodinow's analysis.

Any name will get close to 1/2.

And if the probability of a girl having that name is equal to the probability of a boy having that name, you will get exactly 1/2.

"Any name will get close to 1/2."

This is not true. Read Mlodinow's analysis. If you think it is incorrect please let me know where. As he says at the end of his analysis:

" .. note that this approaches ½ as x approaches 0 (i.e., the odds approach ½ if the name is extremely rare, which is what the problem assumed).note also that the answer approaches 1/3 as x approaches 1 ... "

Oh, and since the most popular boy's name is Jacob at a little over 1%...

P(GG|G named Jacob) = (2-epsilon)/(4-epsilon-.01) = 0.5013.

Mikep,

Ah yes, I was assuming that you could name both children the same name. Doing it your way (the realistic way!) I get the same result.

Yes, provided we allow it to be slightly different from 1/2, we use a more common name, and we test it against either the entire population of two child families or a randomly selected or constructed subset of that population.

But before we go further on this, please note that half the girls named Jane who live in two-child families have a brother and half the girls named Jane who live in two-child families have a sister. If that isn't a sufficient test, I frankly don't know what we're testing.

mark P, what is your answer to the Kings sibling problem which I will state as follow:

The king has one sibling, what is the probability that the kings sibling is male?

?

Your answers to the Kings sibling question do not correspond to the well-known answers given in basic probabiltiy texts: A couple of introductory texts that discuss this problem that I found online include:

1. Bertsekas and Tsitsiklis

http://www.athenasc.com/probbook.html

See problem 1.28 given in the link to "first chapter". The answer is given in the link to "problem solutions".

2. The Pleasures of Probability by Richard Issacs

The link to the page after the Kings sibling problem is :

http://books.google.com/books?id=a_2vsIx4FQMC&pg=PA24&lpg=PA24&dq=king's+sibling+probability&source=web&ots=fUHgGKURMg&sig=8zQYBbFC5mtEpxjlsTup2Esmi18&hl=en#PPA24,M1

MikeP, here's how I'd construct the test.

First, we start with families with 2 children. These will be approximately 1/4 BB, 1/2 BG girl, 1/4 GG.

The father says he has at least one girl. So we throw out the BB case and now we have 2 cases, 2/3 BG and 1/3 GG.

Now the father says he has a daughter named Jane. we throw out all the cases where there's no girl named Jane. The thing that might make this result somewhat different from 1/3 is that when there are 2 girls there are 2 chances to name them Jane, and that will distort the odds a little bit.

But it ought to be close to 1/3.

Now let me present the opposed reasoning. When we need to guess the chance of gender of one single person, without more information the chance is around 1/2. So in this case if we know the gender of one we might say that the remaining question is to find the gender of the other, it's a single case, and so the chance should again be 1/2.

But I say, why should you throw away the information you already have and choose your minimum-information estimate?

"But before we go further on this, please note that half the girls named Jane who live in two-child families have a brother and half the girls named Jane who live in two-child families have a sister. If that isn't a sufficient test, I frankly don't know what we're testing."

We're testing guys who truthfully say they have two children and at least one of them is a girl named Jane.

Let's imagine that 0.1% of girls are named Jane, and that two sisters are never both named Jane. Then the table goes roughly:

1/4 BB
1/2 * 999/1000 BG
1/4 * 999/1000 GG

1/2 * 1/1000 BJ
1/4 * 1/1000 GJ

When the father says he has a girl, we throw out the BB case.

When the father says he has a girl named Jane we throw out all but the BJ and GJ cases.

Would you propose a different protocol?

I contend that if the king is elected from the male children without consideration of birth order or is elected from the first-born without consideration of sex, then the probability the king's sibling is male is 1/2.

When there is a male king if there's a son, then it comes out 1/3.

Ignoring infanticide etc, the roughly equal choices are MM MF FM FF. Throw out the FF case. We have 3 cases left, all of which give one king. 1/3 the king has a brother. It doesn't matter whether the older or the younger son becomes king, it's still 1/3.

If an older female becomes queen and there's no king, then it ought to be 1/2. Throw out all the cases with queens, and the remainder have an older king and one younger sibling who can be male or female. We throw out two cases instead of one.

I want to point out that you aren't the only one confused about this. We can't solve this kind of problem with deep philosophical distinctions about which individuals are distinguishable. That way leads to unending confusion.

To make it work you have to look first at the distribution in reality -- in this case GG 1/4 GB 1/2 BB 1/4. And then look at how your sampling affects the observed results. The father announces he has a daughter equally whether he has one daughter or two. If he was 50% likely to say that when he had one daughter, but 100% likely to say it when he had two, the result would come out different.

Distribution. Then sampling bias.

I would say that the difference between the king's sibling and the girl named Florida is that you are forced to crown one of your boys king. Thus the fact that one of them is king adds no information. You are not forced to name one of your girls Florida. The fact that one of them is Florida does add information.

P(boy is King) must be very small. Maybe even more rare than Girl's named Florida.

Why, if a father has two boys, he has twice the odds of crowning a son!

David,

P(boy is King) must be very small. Maybe even more rare than Girl's named Florida.

Except that, by the problem statement, we are stuck inside the family of the man who would be king. In this case "one of the children is king" correlates exactly with "one of the children is a boy". We don't get any information that distinguishes between boy children, so the answer 1/3 is unchanged.

And now once more unto the breach...

King Henry has one sibling. What is the probability that his sibling is male?

Provided birth order doesn't matter in the king's election, I will again offer 1/2 as the answer.

There's one king. If there's two boys it's one or the other. There's one Henry. Same thing.

But, a priori, king and Henry are (mostly) independent. By saying King Henry, I have nailed the trait king to the trait Henry and changed the statement of the problem.

So, if boys have the name Henry with probability .001, in a thousand random royal families...

1/4 GG
1/2 * 999/1000 BG
1/4 * 998/1000 BB
1/4 * 1/1000 HG
1/4 * 1/1000 GH
1/4 * 1/1000 HB
1/4 * 1/1000 BH

Since the king in the problem is named Henry, we are dealing solely with the last four lines. The king's sibling is male with probability 1/2.

I'm trying to generalize some of the ideas discussed so please let me know if you agrees with my probabilities below.

You are on the phone with a father who says I have at least one daughter. To prove this he puts his daughter on the phone. Below are what I think the probabilities that both children are female when the daughter on the phone says various things:

Daughter says: P(2 sisters)

"Hi" 1/3
"I am the eldest" 1/2
"I am the eldest daughter" 1/3
"My name is Jane" 1/3
"I have black hair" 1/3
"I have black hair but my sibling doesn't" 1/3
"I am the smartest" 1/2

Assuming intelligence is independent of gender.

"Hi" 1/3

1/3

"I am the eldest" 1/2

1/2

"I am the eldest daughter" 1/3

1
Assuming she won't say she's the oldest daughter when she's the only daughter.

"My name is Jane" 1/3

1/3 If we'd started out by picking families that had a Jane it would be 1/2. But everybody has a name.

"I have black hair" 1/3

p(boy) = 0.5
p(black-hair girl) = p
p(nonblack-hair girl) = 0.5-p

p(2 boys) = .25
p(boy+black) = 0.5p + p*0.5 = p
p(boy+nonblack) = 0.5(0.5-p) + 0.5(0.5-p) = 0.5-p
p(2 nonblacks) = (0.5-p)(0.5-p)
p(black+nonblack)= p(0.5-p) + (0.5-p)p = 2p(0.5-p)
p(black+black)=p*p

remove impossible cases
2 boys boy+nonblack 2nonblack

p(black+black) + p(black+nonblack) /
1- p(2 boys) - p(boy+nonblack) - p(2 nonblacks)

p*p + 2p*(0.5-p)/ (1 - .25 - (0.5-p) - (0.5-p)(0.5-p)) =

p*p + p -2p*p / 0.75 -0.5 + p - 0.25 + p - p*p

p - p*p / 2p - p*p

(1-p)/(2-p)

Not what I would have expected. Likely I made a mistake somewhere.

Sampling bias. I calculated the chance of getting two girls, and at least one of them has black hair. But the particular girl we talk to has black hair. It doesn't help if her sister does instead!

particular girl has black hair, + other girl /
girl has black hair, + girl or boy

0.5p / p

0.5

"I have black hair but my sibling doesn't" 1/3

remove both black from numerator. demoninator is black + nonblack girl, plus black + nonblack boy.

P(nonblack boy) (0.5-p)

p(0.5-p) / p(0.5-p) + p(0.5-p)

1/2

"I am the smartest" 1/2

This has got to be wrong. When we found out there was a girl, that made it 1/3. She tells you she's smarter, or she tells you she isn't smarter, whichever is true, and that changes the odds? There has to be something wrong with that.

Start over.

"Hi" 1/3

So far she hasn't given any information except to confirm that there's a girl. One girl or two, either way he can put one on the line.

"I am the eldest" 1/2

This has got to be wrong. Ten seconds ago we thought it was 1/3. She says she's the older and now it's 1/2. If she said she's the younger it would be 1/2 too. How can it be that anything she says makes us throw away information?

Let's pretend we're writing a computer program to model this. First it computes two random numbers for girls or boys. If it's two boys it throws away the numbers and starts over.

Then it picks a girl and prints "Hi".

Then it computes another random number to say whether the girl is younger or older. It prints "I'm older" or "I'm younger".

What's the chance now that the other number is a girl?

1/3.

OK, how about if when the girl is younger it throws that away and starts over. Does that change the odds? No. It was 1/3 before it had a 50% chance to start over, and it's still 1/3 after it randomly chose not to.

Say the father said "First I had a daughter, and then another child." Then it's 1/2. What's the difference? It's selected differently. The program would first put in the number for daughter and then randomly compute another number.

But when he puts a girl on the line and she turns out to be older (or younger) that doesn't change the odds. Of the two chances to get a brother, both of them had a 1/2 chance she'd be older. Of the one chance to get a sister, it was 1/2 the older sister would be at the phone. It doesn't change anything.

"I am the eldest daughter" 1/3

1, depending on how she uses the language.

"My name is Jane" 1/3

1/3. There was every reason to think she had a name before she got on the phone. You don't learn anything about the odds when she tells you what the name is.

"I have black hair" 1/3

This does nothing to change the odds. You knew she had some hair color, and her brother or sister has some hair color. Finding out which this particular sister has doesn't tell you about her sibling.

"I have black hair but my sibling doesn't" 1/3

Ditto. Unless boys and girls tend to have different hair colors it tells you nothing about useful the sibling.

"I am the smartest" 1/2

Makes no difference.

If the child who talks to you on the phone was *chosen* for any of these qualities then they might make some sort of difference. But she wasn't. She was chosen because she was a daughter. If she was the only daughter then it was her. If there were two daughters I assume she was chosen randomly. Details about her don't matter except details that change the odds what gender her sibling is.

"I am the eldest"
Given that the eldest child is a girl, the probability of two girls is clearly 1/2. I don't think many people will agree with J Thomas's latest response to this question.

"If the original condition was that a girl is the oldest, that's different. Then there are only 2 choices."

Yes! There is only one "smartest" too. And, like oldest/youngest, intelligence is independent of gender. Notice that I didn't have the girl saying "I am smart"

Would you be happier if the father said "the eldest child is a girl" rather then the girl saying "I am the eldest child"??

Would you be happier if the father said "the eldest child is a girl" rather then the girl saying "I am the eldest child"??

Yes.

When you know there's a girl, you have 2 choices -- BG 2/3 and GG 1/3. If he tells you the oldest child is a girl then that will be true half the time for the BG case and all the time for the GG case. So it turns to BG 1/2 GG 1/2.

But if he shows you a girl at random, it's 1/2 she's the oldest child when the other one is a boy and 1/2 she's the oldest child when the other one is a girls. So then the chance is still 1/3.

Let me rephrase my original questions then

You are on the phone with a father who says I have at least one daughter. He lists the following facts, in no particular order. Below are what I think the probabilities that both children are female in the light of these facts

1. "My eldest child is female" 1/2
2. "I have a daughter named Jane" 1/3
3. "I have a daughter who has black hair" 1/3
4. "I have a daughter with black her but her sibling doesn't"
5. "The smartest child is female " 1/2

Sheetwise,
"So, it's 1/2 whether she's the oldest or the youngest! Given that those are the only two possibilities -- shouldn't the probability should be 1/2 whether she has a name or not!"

That's because your alternatives overlap in the GG combination where both youngest and oldest are girls. GG does double duty. Anyway, if you were allowed to ask both questions then there would be no uncertainty.

Identifying one girl by name only gives 1/2 if the choice of which girl is identified in the GG case is not arbitrary. If someone volunteers the information that "this family includes a girl with SS# 123-45-6789", the probability of GG is still 1/3 because for the GG case the choice of which SS# is revealed is - for all we know - chosen at random.

1. "My eldest child is female" 1/2

1/2

2. "I have a daughter named Jane" 1/3

1/2

3. "I have a daughter who has black hair" 1/3

Somewhere between 1/3 and 1/2

4. "I have a daughter with black her but her sibling doesn't"

1/2

5. "The smartest child is female " 1/2

1/2

To go back and refine something I wrote upthread at Jul 10, 2008 5:29:24 AM, I think the key is that, if the trait ascribed to the girl is (a) sex-independent and (b) exclusive or is (c) so rare that it is approximately sex-independent and exclusive, then the probability the other child is a girl is 1/2.

1, 4, and 5 meet the criteria for 1/2. Note that 4 can be written like 1 and 5 to make the isomorphism clearer.

2 is the Mlodinow argument. 3 is the Mlodinow argument when the probability of the trait is high.

"He tells you the older child is a girl. When it's GB the girl will be older only half the time."

In my line of thinking, once he tells you that the older child is a girl, she now becomes unique. Much more unique than if he just gives you a name. There can be only one eldest daughter.

I think you meant to write "When it's GG the girl will be older only half the time" -- I may be wrong. But still -- the girl, now being further identified as the eldest, becomes unique.

MikeP, I am confused about when your reasoning applies and when it doesn't. I believe you are confused about that too.

It's only natural to say "We start out with three equal possibilities, so the chance for choice A is 1/3. We see that choice B is not right. Therefore A and C are now equally likely. But everybody knows that it's wrong to apply that reasoning to the Monty Hall problem. When does it apply?

Set it up like Monty Hall. You have three doors. Behind one door is older-boy younger-girl. Behind another is older-girl younger-boy. Behind the one you have chosen is older-girl younger-girl. Monty says "The older is a girl!" He opens the older-boy younger-girl door and shows you that answer is wrong.

Is it now true that the remaining two doors are equally 1/2 or is the older-boy younger-girl door now 2/3 and the older-girl younger-girl door still 1/3?

What reasoning would we use to decide?

J Thomas,

All of my reasoning is based on learning the father has a girl and the girl has a trait simultaneously. You have me wondering now whether, when the facts come as events in a sequence, the way we learn the subsequent facts has a bearing on whether the probability is 1/3 or 1/2. Your Monty Hall example makes me think it does, since Monty's goal is to force data on you after he knows what door you chose.

Nonetheless, your posing of our wager is acceptable to me because we chose the trait Jane, so no information will be forced upon us.

While I find it interesting to speculate on what happens when the father can premise his later information on his earlier information, I wonder if your framing of the problem that way means that you believe that, when the trait comes simultaneously with girl, the probability is 1/2.

That is, after all, the original statement of the problem.

It's not too long, but typepad has mangled the tabs.
------------------
use strict;

my $NUM_FAMILIES = 100000;
my $KIDS_PER_FAMILY = 2;

srand();

my $fams_with_florida_a = 0;
my $fams_with_florida_b = 0;
my $fams_with_florida_and_all_girls_a = 0;
my $fams_with_florida_and_all_girls_b = 0;
my $fams_with_one_or_more_girls = 0;
my $fams_with_all_girls = 0;

open OUT, ">", "florida.csv";

for(my $i = 0; $i > -4.0; $i -= 0.1 )
{
my $P_FLORIDA = 10**$i;

for(my $k = 0; $k < $NUM_FAMILIES; ++$k) { my $girl_count = 0; my $have_florida_a = 0; my $have_florida_b = 0; my $fam_would_name_florida = (rand() < $P_FLORIDA); for(my $j = 0; $j < $KIDS_PER_FAMILY; ++$j) { if(rand() < 0.5) { ++$girl_count; # CASE A: original calc with odds per kid if(!$have_florida_a && (rand() < $P_FLORIDA)) { $have_florida_a = 1; ++$fams_with_florida_a; } # CASE B: calc with odds per family if(!$have_florida_b && $fam_would_name_florida) { $have_florida_b = 1; ++$fams_with_florida_b; } } } if($girl_count > 0) { ++$fams_with_one_or_more_girls; }
if($girl_count == $KIDS_PER_FAMILY)
{
++$fams_with_all_girls;
if($have_florida_a) { ++$fams_with_florida_and_all_girls_a; }
if($have_florida_b) { ++$fams_with_florida_and_all_girls_b; }
}
}

print OUT $P_FLORIDA, ",", $fams_with_florida_and_all_girls_a / $fams_with_florida_a, ",",
$fams_with_florida_and_all_girls_b / $fams_with_florida_b, ",",
$fams_with_all_girls / $fams_with_one_or_more_girls, "\n";
}

close OUT;

There seems to be an intuitive reluctance to depart the 1/3 probability to the 1/2 probability. But, if I may point something out that may have been forgotten, it is the 1/3 result that is the actually nonintuitive one.

What is the probability a child is a girl? 1/2. What is the probability that a first-born girl's sibling is a girl? 1/2. What is the probability that a girl named Florida's sibling is a girl? 1/2.

What is the probability there is at least one girl in a family with a girl? 1.

It is only the peculiar statement of the P(GG|G) problem that gloms two girls in one family together, forcing us to count families rather than girls, that yields the nonintuitive 1/3 result. Attempts to prove the probability of different questions by torturing them through the 1/3 state are fraught with peril.

OOPS! You're absolutely right MikeP. I got the idea to sweep P(Florida) and chart it after testing the single-run and forgot to move those initializers.

My name below links to an updated chart, hosted by ImageShack. Makes a lot more sense, sort of.

I think it's an interesting result, and what I take away from it and this whole thread is that reasonable people can differ about a proper function for P(Florida|2Girls). I don't beleive it's axiomatic that it's a function of girls named Florida per capita.

Yet again unto the breach...

You meet a woman on the street. She tells you that she has two children, one of whom is the king. What is the probability that the king's sibling is male?

1/2

If you think it is 1/3, ask yourself how on earth the woman could possibly tell you that her son was king without consigning his sibling to being more likely to be a princess than a prince?

Now, compare with...

You talk to a woman at a convention of queen mothers. She tells you that she has two children, one of whom is the king. What is the probability that the king's sibling is male?

1/3

Why? You already know that if she has a boy, he is king. Her statement here, given what we know, is equivalent to "I have two children, one of whom is a boy."

I don't think these problems are so hard. The majority of the confusion comes in the modeling of the circumstances that lead one to consider the question in the first place.

If you are interested in this question (i.e. the probability of two daughters, denoted by P(GG)) because you randomly met a girl on the street then you need to know the probabilities that governed the meeting. We have seen here a variety of appropriate probabilities for the meeting which lead to sibling probabilities of 1/2 and 1/3.

If you are interested in question because one girl had an particular name then you need to consider the prior probabilities that govern naming. This analysis was done by Mlodinow in the WSJ blog and leads to probabilities between 1/3 - 1/2 depending on the prior name probability. The prior probability of any name currently in US use results in a P(GG) apprx 0.5

If you don't want to model the circumstances that lead to you consider the question in the first place then the only randomness is that governing the births. The appropriate sample space is a 2 by 2 matrix covering the 4 obvious choices for the children's gender. The issue is then what events on this sample space are resolvable by the information you have.

mark, the outcome is dependent on the conditional probability of the prior given the posterior. In the case that the prior is independent of the posterior, the prior becomes irrelevant.

A lot of random variables you might assign to the given one girl are conflated somehow with variables you can assign to the family. Hereditary traits, names (function of culture, etc.). I think I showed pretty clearly that if a probability is per family then P(F|GG) = P(F) and the shift is nil.

Using simple name probability implies that every girl born has an equal chance to be named Florida, no matter what family they are born to. And we know that's not true.

Sorry, mark. I don't follow.

I don't think these problems are so hard. The majority of the confusion comes in the modeling of the circumstances that lead one to consider the question in the first place.

Yes, that's exactly where the problem is. How hard that seems depends on how confident you are that you do it correctly without a lot of trouble.

Does the father have to tell us his daughter is named Florida whenever either of his daughters is named Florida? If half the time he tells us the name of the other girl, then the fraction sinks back to 1/3.

No it doesn't. Whatever the name of his other child is, it has a probability distribution that looks pretty much the same as the one for Florida, and you are now talking about those girls in those families.

Why can't a referenced girl named Florida remain unique even if she has a sibling named Florida?

One can't argue with Mlodinow's analysis that showed the probability of two female siblings given that one sibling is a girl named Florida, (where the naming of the girl was a random even governed by a known probability) is 1/2.

But one can dispute that this is a good way to model the situation. Specifically it is not useful to model naming as a random event. Here's an example why:

Consider the event that the girl has ANY name (including the name Florida). The event that the child has ANY name is the union of the disjoint events representing all possible names. As such its probability is the sum of the probabilities of all possible names, and is thus equal to 1. In Mlodinow's analysis when the random name event has probability equal to one, the two sibling probability becomes equal to 1/3.

So we have the following

1. P(GG | that there is a girl named Florida) =1/2
2 P(GG | that there is a girl with a name (incl Florida)) =1/3

Consider situations where (1) is relevant. Say, over your lifetime you meet a few girls named Florida. THOSE girls (girls with he name Florida) will have female siblings 50% of the time.

One the other hand, consider situations where (2) is relevant. You meet a girl with a name (it can be anything including Florida). Those girls will have female siblings 33% of the time.

Thus given that we know there is a girl name Florida we can either perform the conditioning in (1) or (2). (You are not compelled to perform the finest grained conditioning that is possible). To see that (2) is more useful, consider the situation where we have no information. The probability of a girl is 1/2. So we can guess with 50% error. Using the conditioning in (2) we can improve upon that and guess with error 33%. Using the condition in (1), we are stuck at square one

We get the same probability as in (2) when we do not using a random model for naming.

So to summarize: using a random naming model allows us to legitimately throw away information, and bring us back to the probabilities that apply when we have no information.

What if I am not "equally likely to meet either sister". One case where you would be equally likely to meet either sister is in a closed town where all the siblings stick around for ever. But what if you were in a grade 4 class room in such a town? As siblings are generally not in the same class you would no longer be "equally likely to meet either sister".

Better yet, let me paraphrase Mike P, and suggest you try analysis the problem without introducing other randomness. Furthermore go through Mlodinow's analysis and let me know if you find a mathematical error.

J Thomas,

If it has to be a father with two children and he has to tell you "I have at least one daughter" and tell you a daughter's name, then it's 1/3. When he has a daughter named Florida and another daughter, half the time he tells you the other daughter's name.

But for another person who heard the problem as "a girl named Georgia", the same father might be able to say he has a daughter named Georgia. The fathers of a girl named Florida and a boy cannot.

What are the odds that the sister is named Georgia? That's not terribly relevant: There are 100,000 ways to ask the problem, with "a girl named Florida" and "a girl named Georgia" only two of them. Each GG family is counted by two of these statements. Each GB family is counted by only one. So when interested in name N, a GG family is twice as likely to give you affirmative information as a GB family, balancing the fact that there are twice as many GB families in toto.

It's 1/2.

But what if you were in a grade 4 class room in such a town? As siblings are generally not in the same class you would no longer be "equally likely to meet either sister".

Surely you are not suggesting that if a father says, "I have two children who are not in the same grade. One is a girl in fourth grade." the probability the other is a girl is 1/3.

Furthermore go through Mlodinow's analysis and let me know if you find a mathematical error.

I did find a mathematical error: Mlodinow does not consider the probability that that a boy might have the name Florida. I went through the analysis upthread at Jul 11, 2008 3:06:04 PM and at Jul 11, 2008 4:34:34 PM.

"Surely you are not suggesting that if a father says, "I have two children who are not in the same grade. One is a girl in fourth grade." the probability the other is a girl is 1/3."

In your above paragraph, I cannot see where there is any meeting of any sister. I like that. If you can dispense with considering the probabilities that govern meeting one sister over the other, we will make much faster progress.

As for Mlodinow's alaysis, there is no need to consider the probability that a boy might have the name Florida. And even if you did consider it, it wouldn't change anything

Mike P, Do you agree with

If a girl has a name (and that name could indeed be Florida), the probability her sibling is a girl is 1/3.

That should be: this is not the estimate we are concerned with.

If a girl has a name (and that name could indeed be Florida), the probability her sibling is a girl is 1/3.

I agree. If the statement or metastatement cannot distinguish between two girls than it does not distinguish between two girls.

David,

I suppose I am not privy to the implicit sociological or anthropological information you appear to have that lets you better model how children get named. Is "Florida" the issue? What if it was "Jane"?

What if I am not "equally likely to meet either sister". One case where you would be equally likely to meet either sister is in a closed town where all the siblings stick around for ever. But what if you were in a grade 4 class room in such a town?

Then a family with 2 girls has twice as many chances to have a girl in the grade 4 class than a family with only 1 girl. Still 2 chances and so the it comes out to 1/2.

It depends on how the selection goes, and the selection could go either way.

One way to avoid all this talk about probabilities govering the meeting and/or selection would be to pose the questions as:

Given that you meet a girl what is the is the probability that her sister is a girl?

Given that you meet a girl called Florida what is the is the probability that her sister is a girl?

Given that you meet a girl with any name (inc. Florida) what is the is the probability that her sister is a girl?

Mike

When I meet a girl called Florida, the following events have occurred

1. I have met a girl
2. I have met a girl called Florida
3. I have met a girl with a name (which may be Florida)

To see this, are any of them false? Clearly not.

Thus they are all valid conditioning events. And we could now form the GG probabilities using these events as the conditioning events.

Only conditioning on 1 and 3 helps you in your goal to predict the probability of the sibling being a sister. Conditioning on 2 yields a 50/50
split, which doesn't help compared to a guess.

"Given that you meet a girl what is the is the probability that her sister is a girl?"

If you think this is 1/2 you haven't mastered the basic problem yet, let alone the added complexity of the random naming.

To see this run the earlier simulation without the random naming bit.

Better yet, let me paraphrase Mike P, and suggest you try analysis the problem without introducing other randomness.

I'm not clear how to do that. I'll try.

"A man tells you he has a daughter and also another child. What's the chance the other child is a girl?"

We know about the daughter, so set her aside. the chance that a single unknown child is a girl is 1/2.

Usually we assume there's a random distribution of two children that would be BB BG GB GG except it can't be BB. But that isn't what we have.

There's one girl that he told us about. We can distinguish her from the other child he didn't tell us about because he told us about her. So the chance for the other child is 1/2.

The one way, we randomly choose two children, then we throw away the case if it's two boys.

This way we choose a girl and then randomly choose another child.

Which is the right way to fit the man's words?

Furthermore go through Mlodinow's analysis and let me know if you find a mathematical error.

I reproduced it and I find no error provided that's the way the selection goes.

If we start out with families that have a girl named Florida, and if the chance that a girl is named Florida is p, and the chance that a family with two girls has at least one girl named Florida is 2p - p*p, and the man is obliged to tell you he has a daughter named Florida, then the result comes out 1/2.

If we do all of that but the man is obliged only to tell you one of his daughters' names, then it's 1/3.

If he isn't required to have a daughter named Florida but it turns out he does, and he isn't required to tell you her name instead of her sister's but it turns out he does, then it's still 1/3.

If he's required to have a girl named Florida first, and then he's required to have another child too, then it's 1/2.

You have to make assumptions about what could have happened beyond what actually did happen. You can't help it. You can't make a conclusion without doing that, implicitly and instinctively if you don't notice you're doing it.

Given that you meet a girl what is the is the probability that her sister is a girl?

Given that you meet a girl called Florida what is the is the probability that her sister is a girl?

Given that you meet a girl with any name (inc. Florida) what is the is the probability that her sister is a girl?

And all of those are 1/2.

Agreed.

In every case except the second, you are just as likely to meet her sister as you are to meet her. So there are 2 chances to meet a girl with a sister (because you meet her or her sister) and 2 chances to meet a girl with a brother (because there are twice as many families with brother/sister as sister/sister). 1/2 in every case.

Mike

In a recent posts your write:

How do you have more information than "there is at least one girl"?

I have the information that there is exactly one girl standing in front of me. Thus I can build the statement "there is at least one girl" entirely from the information in front of me.

So aren't you admitting that the two problems below are the same:

Given that you meet a girl what is the is the probability that her sister is a girl?

Given that there is at least one female girl, what is the is the probability that her sister is a girl?

As for your comment about the random naming when you said "I notice you don't include the adverb 'accurately' here" I don't disagree with the random naming formulation but let me ask a very engineering question. Given that you met girl called Florida and you had to guess the sibling's gender what would you pick? Bearing in mind that the as P(Florida) increases from zero the 50/50 split starts to favor the boy. Clearly the only choice that gives you better than a guess is that same choice predicted by a non-random naming model.

If not then how is it different to the sibling questions?

Your coin question is 1/3. It is identical to "at least one girl".

If you tossed a penny and a nickel, and the penny was heads, then the nickel is heads with probability 1/2. That is identical to the "girl named Penny" question.

Let's try yet another variation.

You are attending a special party with your sister. This is a party that girls attend with their brothers. A girl who has one brother brings him to the party. A girl who has no brother brings a date. If a girl has a sister, one of them brings a date and the other doesn't attend -- it's strictly one sister per family.

You meet a girl. What's the chance she has a date with her, and not her brother?

It should be clear that this chance is 1/3.

1/4 of the girls who might have attended did not, because their sisters did. Of the remaining 3/4, 2/3 are there with brothers. (Unless a woman with a date is more likely to attend than a woman with a brother, there's always that.)

The question in each problem like this is whether to count the GG case twice or not. If you do count it twice the chance is 1/2. If you only count it once the chance is 1/3.

Which times should you count it twice?

J Thomas I understand your rational but I would suggest you get away from the viewpoint of meeting a girl from a population. With this viewpoint it will always boil down what probabilities are you are using for P(meet girl). These probabilities will indeed depend on, as you say, whether for each family of two girls both girls appear in your population or not.

J Thomas I understand your rational but I would suggest you get away from the viewpoint of meeting a girl from a population. With this viewpoint it will always boil down what probabilities are you are using for P(meet girl). These probabilities will indeed depend on, as you say, whether for each family of two girls both girls appear in your population or not.

Why should I get away from this viewpoint?

It works. It gives me a prediction that I can understand.

The sterile philosophising about whether you can tell the difference between the girls or not does not work. I can't predict with it. I can come up with endless alternate arguments without a good way to choose among them.

I have two steps. What is the probability that a sample matching the description will be drawn from the population? What is the probability that given a sample that matches the description, it will actually get described that way?

If I know those two I can make a prediction that works. If I can't do that then I wind up confused, arguing with a bunch of confused people who don't understand it either.

mark,

My apologies. There is a hazard in that the observer is the one figuring out probabilities.

If he can look at only one coin, the probability is 1/2.

If he can look at two coins sequentially, and he is forced to look at the second coin to find a head, then the probability of two heads is obviously 0.

If you want to say that he forgets how he observed the head, then P(HH|H) is 1/3.

Do you agree with...

If he looks at only one coin, and it is a head, then the probability the other is a head is 1/2.

...?

Given that you observe a coin that is heads, what is the is the probability that the other coin is heads?

the answer is 1/3.

The answer is 1/2 if you are only allowed to observe one coin.

If the first coin you observe is a head, you know nothing about the second coin, yielding 1/2.

It depends on what you know.

Try it this way. You go to a casino where they have a game -- they throw two coins, a penny and a nickel, and they don't show them to you. You bet that the coins are both heads.

If the coins are both tails, they show them to you and you lose.

Otherwise, they show you one coin that's heads and they invite you to bet again. What's the chance the other coin is a head?

The chance is 1/3. This is just like the Monty Hall problem. If it isn't both tails then there's a head to show you. You learn nothing by seeing that head. Whether the head is a penny or a nickel, whether it's a 1943 steel penny with an X scratched on the back and Eisenhower's name engraved on the front, you still haven't learned anything except that there's at least one head.

Now try it with different rules. If it's two tails you lose. If the penny is heads they have to show you the penny. If the penny is tails but the nickel is heads they show you the nickel. Now you know more. If they show you the penny your chance is 1/2. If they show you the nickel your chance is zero.

It isn't enough to see what happens, that you first found it wasn't two tails and then that the penny was heads.

You have to know the rules of the game.

Mike P

If he randomly chose the coin to look at (assuming a prior probability p) then the probability the other is a head is 1/2.

If there was no randomness associated with him looking at the coin then the probability that the other coin is a head is 1/3.

You may think that examples where he can see only one coin non-randomly are somewhat contrived. Furthermore the obvious way of him making his random choice (i.e. picking a coin whose face he can't see) yields a final answer independent of p.

Nevertheless in the absence of any assumptions I contend that "seeing a head" must be equivalent to "at least one head" and thus the probability in question is 1/3. All other interpretations additional assumptions.

I mean, really.

You are in a room. There is no one else anywhere around. You flip two coins. You see one of them. No matter how you happen to see the first one first -- randomly, through some predetermined process, or through some postdetermined process -- and no matter whether you see a head or a tail, you gain absolutely no information about the second coin. It is a head with probability 1/2.

Nevertheless in the absence of any assumptions I contend that "seeing a head" must be equivalent to "at least one head" and thus the probability in question is 1/3. All other interpretations additional assumptions.

You can't do it without any assumptions.

You can say that your assumptions are somehow simpler and so they fit Occam's razor, or your assumptions are somehow more aesthetic or more likely to fit situations from real life or something.

But you have to make the assumptions before you can get a conclusion. If you have enough data you can test how well it fits your assumptions. With one data point you cannot.

It's been an interesting read. I sent the question over to the 2+2 Forums where I believe hunterotd nailed the solution down.

Gamblers tend to make the best statisticians ;)

I toss two coins. I show you one coin that is a head. How I come to choose that coin involves no randomness. You know nothing else. The probability that the other coin is a head is 1/3.

true or false?

Let me try a different tack:

I toss two coins. Whenever there is a head I show it to you, and you get to study it. What is he probability that the other coin is a head?.

"But that is not equivalent to meeting a girl. That is equivalent to someone else meeting two siblings and then directing a girl to you"

The equivalence is that in both situations you are encountering the girl (or coin) in a manner that involves no additional assumptions about any randomness governing the meeting. The only randomness is that of the original random experiment. Furthermore there is no information allowing you to identify which particular coin or girl of the pair you are encountering.

Personally I think when we meet a girl, say on the street, the above represent a pretty good way to model that event.

Can you rephrase it without using the words "first" and "second"?

You have already agreed that if you see that one coin via the following process:

Whenever there is a head it is shown to you

then the probability of two heads is 1/3.

If however you see that coin randomly then the probability of two heads is 1/2.

How about the following. Two coins are tossed. Whenever there is a least one head, one head goes out in the world, calls itself Jane, meets you on the street, changes its name to Florida, then Jacob, becomes king and dies 30 years later.

The probability that the other coin is heads is 1/3.

over and out

You are in a room. There is no one else anywhere around. You flip two coins. You see exactly one of them. It does not matter how you happen to see that coin -- randomly, through some predetermined process, or through some postdetermined process. If you see a head, the probability there are two heads is 1/2.

Do you agree?

If the process that lets you see the coin gives you the same chance to see it whether it is heads or tails, then I think it shouldn't affect the odds on the other coin.

That should read:
"one of my children is a " .. .. "girl".

In the light of some discussion on the 2+2 site let me put it this way:

Two babies are generated behind a curtain. What is the probability of two girls when:

1. A female detector sounds the alarm that a female is present
2. A female detector sounds the alarm that a female is present, and then a female steps from behind the curtain.
3. One female steps from behind the curtain.

I think they are all 1/3.

Mark P will say 1/3 for Q1, and 1/2 for Q3, but I don't know what he will say for Q2

Mark P..

By your logic you should also be saying that Q1 is 1/2. Becasue the detector had to detect one female baby. It only needs one to set of the alarm. Which one was it? There is some probability that it was one over the other.

To continue along this line, let's say the detector relied on visual information. Then suddenly "detection" here becomes a lot like seeing.. and we know your opinion on seeing a female. You wrote:

"You are in a room. There is no one else anywhere around. You flip two coins. You see exactly one of them. It does not matter how you happen to see that coin -- randomly, through some predetermined process, or through some postdetermined process. If you see a head, the probability there are two heads is 1/2."

The upshot of this is that whenever we see, or detect a child, the only information we have is that "there is at least one child" which results in a P(GG) probability of 1/3. The entire situation changes when we randomly model the detection or seeing process in which case P(GG) equals 1/2

You were quite happy to model the detection process non-randomly (thats how you got the 1/3 answer). But you're not so happy to model seeing or meeting non-randomly.

The detector goes-off as soon as it "sees" one girl. That's how detectors work. How is it any different to you seeing one girl?

Feel free to email me once you have thought about this a bit more

mark,

Try this one...

You are holding two coins in a corner of a large room. You flip one coin to land at your feet. You throw the other coin to the far corner of the room.

What is the probability the distant coin is heads given each of the following observations:

A. The coin at your feet is heads.
B. The coin at your feet is tails.
C. The coin at your feet lands on its edge.
D. You don't look at the coin at your feet.

A. The coin at your feet is heads.
B. The coin at your feet is tails.
C. The coin at your feet lands on its edge.
D. You don't look at the coin at your feet.
4. One female steps from behind the curtain next to you.

Obviously:

A 1/2
B 1/2
C 1/2
D 1/2
4 1/2

Consider A. In this case the sample space consists of the 2x2 matrix where one axis is labeled "coin at my feet" and the other is labeled "coin not at my feet". You can make these labels prior to the random event of tossing the coin. Thus these lables are akin to oldest and youngest. The information given allows us to specify two events in the 2x2 matrix. Conditioned on these two events the required probability is 1/2.

Conversely in my question 3, you can not specify one label one axis "girl who you meet". Obviously such a label depends on the outcome of the random event. Now if you said that one PERSON always appeared from behind the curtain, that would be different.

"Some problem statements make us count families. Some problem statements make us count girls."

There is only one random event here, the birth of two children. As soon as you start considering the probability that you meet one of the children on the street or whatever, you are investigating another problem. We're not counting girls nor families. In the context of the random event that we are modeling my earlier questions 1-3 are all statements that equate to the event "there is at least one girl"

I am well that if we start sampling children or families from any normal population that we are familiar with, then everything changes.

Conversely in my question 3, you can not specify one label one axis "girl who you meet". Obviously such a label depends on the outcome of the random event. Now if you said that one PERSON always appeared from behind the curtain, that would be different.

And if your question 3...

3. One female steps from behind the curtain.

...had included the word "always", then the probability would be 1/3. It would be, in fact, the Monty Hall problem, where an empty door (girl) is forced upon you.

But if the problem statement does not include either the explicit or implicit "always", then there is absolutely no reason to presume it.

...a person quite distinct from the girl person with the trait "did not walk through the curtain".

Fixed.

My posting Jul 16, 2008 5:08:49 PM already mentioned the situation where it is specifed that one PERSON always walks from behind the curtain.

To reiterate the key point behind that example...

If the problem statement does not include either the explicit or implicit "always", then there is absolutely no reason to presume it.

mark, that is what you are saying, because it is the only way "at least one girl" can equate to "girl steps through curtain".

This is your event set, is it not?

born born -> steps-through-curtain

B B -> B
B G -> G
G B -> G
G G -> G

That is the only way whatsoever that Q3 can equate to the event "there is at least one girl".

We can infer that there is at least one girl -- the one standing in front of us. We can infer absolutely nothing whatsoever about the child behind the curtain.

In order to infer something about the child behind the curtain, we have to presume some Monty ex machina who, if there is at least one girl, pushes a girl through the curtain.

Lets say the conditioning event is

"If a girl is one of the two babies, then she steps through the curtain"

Given this event, (let's call it A) I believe based on your most recent posts that you would be happy that the probability of two girls is 1/3, i.e. P(GG|A) = 1/3

Now consider B which is the event that "a girl steps through the curtain". If B is true, then A must be true. And if A is true the B must be true. So P(GG|B) = P(GG|A) = 1/3

Then it is not possible that events A or B could be true

I was pointing out how that statement was equivalent to the problem statement. See my post at 11:03:38 PM

Your "conditioning event" A is not, in fact, an event. Rather it is a function performed by your Monty ex machina.

The function is an implication whose antecedent and consequent are both events. In particular, A = C=>B, where C = "at least one girl" and B = "a girl steps through the curtain".

Now B and C are exactly the next to last and the last truth columns in the table above at 6:44:29 PM. It is clear that B implies C. It is also clear that C does not imply B.

Your "let's say the conditioning event is C implies B" is a massive presumption, constructing an implication out of whole cloth while excluding all other possible functions that could yield B. There is nothing in the problem statement of Q3 that permits that presumption.

So now A is not "C implies B", but "C and B".

I realize that we simulcommented, but my 9:30:29 AM comment stands mostly unchanged. "B" does not imply "C and B", as seen in the truth table. This new conditioning event is still made out of whole cloth.

Your X is what I called C. B and X are in the truth table above. It is indeed true that (B and X) <=> B. But that says little about whether X implies B.

My conjecture is true when B is a subset of X

It is exactly the difference between B and X that is at issue here.

In particular, you have to explain why the problem statement...

Two babies are generated behind a curtain. One female steps from behind the curtain.

...disallows a male from stepping from behind the curtain when a female is available yet allows a male to step from behind the curtain when a female is not available.

I think we can agree on that.

I think we can also agree, as per your intersection argument, that 1/3 is also the lower bound of P(GG|Q3), achieved by the Monty ex machina "at least one girl" => "girl steps through curtain".

The upper bound, P(GG|Q3) = 1, is achieved by the Monty ex machina "at least one boy" => "boy steps through curtain".

I would conjecture that the cumulative probability over all possible Monties ex machina is that P(GG|Q3) = 1/2.

Yes, provided the detector gets to see both children.

If the detector's algorithm is "Look at one child. If girl, fire and end. Look at other child. If girl, fire and end," then it is still 1/3.

The key is that (a) it is detecting something that is indistinguishable between the two children and (b) it can look at both children. If looking at one child happens to suffice, that doesn't change the fact that it will look at the other child if it needs to.

No. The probability the other child is female is 1/2.

P(detector fires) = P(first child is girl) + P(first child is boy)P(second child is girl) = 1/2 + 1/2*1/2 = 3/4

P(GG|detector fires) = P(detector fires|GG)P(GG)/P(director fires) = 1(1/4)/(3/4) = 1/3.

Note that if P(second child is girl) is anything but 1/2, I don't get 1/3 for P(GG|detector fires).

Also note that the detector's looking at the first child and not at the second child is exactly equivalent to my looking at the first child and not at the second child which is equivalent -- barring a biased Monty ex machina -- to a child stepping through the curtain. The probability the second child is a girl remains 1/2.

So you no longer subscribe to what you wrote here:

If the detector's algorithm is "Look at one child. If girl, fire and end. Look at other child. If girl, fire and end," then it is still 1/3.

Of course I do.

By "it", I mean P(GG|detector fires). It is 1/3, no matter how the detector is designed, so long as the detector will consider both children if required.

Specifically what is the conditional GG probability if the detector looks at one child, shes a girl, fires and then stops?

If I know the detector fired upon seeing the first child, then I know that P(GG|) is now 1/2. It must be. I know that nobody nowhere nohow has looked at the second child. The probability that that child is a girl remains what it was at the event of birth: 1/2.

If I don't know that the detector fired upon seeing the first child, then P(GG|) is 1/3. Nothing about the detector's information to me permits my distinguishing the girls, so I cannot distinguish the girls.

Incidentally, if I know that a detector with the above algorithm fired on the second child, then I know that P(GG|) = 1.

You clearly believe that P(girl remains behind curtain|girl steps through curtain) = 1/3.

What do you think P(girl remains behind curtain|boy steps through curtain) is?

My point is that, in the absence of a random detection model, and assuming we know no other identifying information, the event of detecting/seeing/meeting a girl is equivalent to the event that there is at least one girl.

And my point is, unless it was forced on you, the seeing or meeting of exactly one girl is identifying information. You can -- nay, must -- label one girl "the child I met" and label the other child "the child I didn't meet". They are no longer indistinguishable. They are no longer able to be double-counted, as detectors are wont to do.

P(girl remains behind curtain|boy steps through curtain) = P(girl remains behind curtain AND boy steps through curtain)/P(boy steps through curtain) = (1/4)/(1/2) = 1/2.

How on earth do you get P(boy steps through curtain) = 3/4? What do you think P(girl steps through curtain) is?

As mentioned:

P(girl steps thru curtain) = P(at least one girl) = 3/4

P(boy steps thru curtain) = P(at least one boy) = 3/4

So, in your world, P(girl steps through curtain) + P(boy steps through curtain) = 1.5?

Basic probability fact: No measure of probability is ever greater than 1.

P(at least one girl) + P(at least one boy)

Those are not mutually exclusive, so their sum is not terribly interesting.

However, "girl steps through curtain" and "boy steps through curtain" are mutually exclusive and cover all possibilities. The sum of their probabilities must be 1.

They may be mutually exclusive in your model but they are not in mine.

Has it occurred to you that any model that assumes, requires, or concludes that a girl stepping through the curtain and a boy stepping through the curtain are not mutually exclusive is hopelessly broken?

So your argument that my setup requires a "monte ex machine" would seem to apply to any simulation requiring the condition event {at least one girl}.

It is true that, if we are explicitly given the information that there is at least one girl, there must be another observer to gather and convey that information.

But the problem statement we are discussing...

Two babies are generated behind a curtain. One female steps from behind the curtain.

...offers no such explicit information and therefore requires no such observer.

I'm saying the events at {least on girl} and {a girl steps from behind a curtain,} are the same (under certain conditions).

You were saying they weren't because the girl stepping from behind the curtain required an observer (or Monty ex machina in your words)

Now your admitting that simulating the event {at least one girl} also requires an observer.

In your latest comment you say " ... offers no such explicit information and therefore requires no such observer."

Are you now saying that {a girl steps from behind a curtain} does not require a monty ex machina to simulate it.

So what are the differences between the two events that I consider the same. To simulate both you require an observer to look at both girls.

"We know there is at least one girl because we see a girl step through the curtain..."

couldn't agree more! thanks for the discussion

"We know there is at least one girl because we see a girl step through the curtain..."

couldn't agree more! thanks for the discussion

Implication does not equal equivalence.

Now maybe you can answer how in the name of all that is holy you recognize that your model has the probability of two mutually exclusive events adding up to more than 1, yet you don't suspect a problem.

Either a girl steps through the curtain or a boy steps through the curtain. Only one can step through the curtain. Another cannot step through the curtain. A single baby has a single sex. It is either boy or girl. It is not both boy and girl.

The events "girl steps through curtain" and "boy steps through curtain" clearly are mutually exclusive.

Of course "at least one girl" and "at least one boy" are not mutually exclusive. But for the problem...

Two babies are generated behind a curtain. One girl steps through the curtain. What is the probability that a girl remains behind the curtain.

...that is really quite irrelevant.

We have been given the single event "girl walks through curtain". That event has an a priori probability p. The probability of the event "boy walks through curtain", being the only and exclusive alternative, would be 1-p. If the game is fair at all, then p should equal 1-p. p therefore is 1/2.

"We know there is at least one girl because we see a girl step through the curtain..."

couldn't agree more! thanks for the discussion

You guys are still here!

The probability you get depends on what has to happen. You can't tell what had to happen from what did happen.

So if you have two randomly-chosen kids behind a curtain and the rule is that a boy can't walk out if there's a girl there, then the girl walking out tells you there's a girl. Before you saw a girl walk out the probability was 3/4 that a girl would walk out. After you see a girl walk out the probability is 1/3 that the other child is a girl.

But if the rule was that either child could walk out then before you saw a girl walk out the probability was 1/2 that a girl would walk out. And after she does it's 1/2 that the other one is a girl too.

It depends on what the rules are. You can't tell what the rules are by one observation. If it happens 50 times the same way then you can get a pretty good idea.

So if you meet 50 men who tell you they have two children and at least one of them is a girl, and none of them tell you about boys, you can be pretty sure they're supposed to tell you about girls. If all 50 of them tell you that their daughters are named Florence then you can be pretty sure only men with daughters named Florence are talking to you. But if the names are Jennifer, Susan, Erydice, Samantha, Susan, Charlene, etc, then they're probably just telling you the names.

If it happens once, how do you know which things had to happen that way and which were allowed to be random?

Try it this way: A man with two children meets you and tells you he has two children. He is then supposed to pick one at random and tell you whether it's a boy or a girl and tell you the child's name. Then it's 1/2 whether he tells you a boy or a girl, and it's 1/2 whether the other child is the same gender. If he has a boy and a girl half the time he tells you about the boy and half the time about the girl, if he has two girls he tells you about a girl each time. 1/2.

Now say if he has two boys he's supposed to not talk about his children at all. But if he has one or more girls he tells you he has a girl and tells you her name. Then it's 1/3 the other one is a girl.

Now say somebody has specifically found a man with two children with a daughter named Florida and this man has been given an airline ticket to come meet you. He's supposed to tell you he has two children and he has a daughter named Florida. Now it's 1/2 again the other one is a girl. The girl named Florida was picked from the beginnning, and we don't know anything about the other child.

I guess the moral is, don't play betting games unless you know the house rules.

J Thomas,

You describe what I called the Monty ex machina -- the agent orchestrating events so the result is not what the simplest and least adorned reading of the problem statement would yield.

At first I thought mark was presuming a Monty ex machina, but it has become clear that his problem is much more fundamental. See his post at Jul 18, 2008 11:25:05 AM. He truly believes that the child who steps through the curtain has a 3/4 probability of being a girl and a 3/4 probability of being a boy.

I expected him to say P(boy steps through curtain) was 1/4 -- as required by a Monty ex machina who was shoving any available girl out the curtain. But mark really thinks that the game is fair -- i.e., it is symmetric between girl and boy. For him it is not an issue of hidden assumptions but of phenomenal solution bias.

Let's restate the problem yet again.

You are at a casino which has a gambling game. The dealer flips two coins and doesn't show them to you immediately. You can bet on two heads or two tails. Clearly your chance to win is 1/4.

You bet on two heads.

The dealer shows a coin to another player, one who bet on two tails. "Damn it, I lost!" Now you know your chance to win is 1/3, which is important if you are allowed to do another round of betting at this point.

After this round of betting the dealer shows you one coin. It is a head. What is your chance to win now? It depends on how the dealer did it.

If the dealer picked a coin at random and showed it to you, your chance has gone up to 1/2. If it was one head and one tail there was a 50% chance he'd show you the tail and you'd already know you lost.

But if the dealer picked a head and showed it to you, the chance is still 1/3. YOu already knew it wasn't two tails, so he could pick a head to show you no matter what. So it doesn't affect the odds at all.

It isn't obvious to me which set of assumptions is the simplest, or the most likely, etc.

If the man with a son and daughter is equally likely to tell you about either of them, then the chance is 1/2.

If the man has to tell you about a daughter and then he's equally likely to tell you either name, the chance is still 1/3.

The problem doesn't say enough about what could have happened. You can't get a solution. It's like solving 3 simultaneous equations in 6 unknowns. No single answer.

The dealer shows a coin to another player, one who bet on two tails. "Damn it, I lost!" Now you know your chance to win is 1/3, which is important if you are allowed to do another round of betting at this point.

Actually, you are making unstated assumptions even here. If the dealer didn't look at both coins before showing the coin to the tails player, you know your chance to win is 1/2. No one has looked at the second coin. There is no way anyone in the world could convey any information to you that would change that coin's probability from the a priori 1/2.

But if the dealer picked a head and showed it to you, the chance is still 1/3. YOu already knew it wasn't two tails, so he could pick a head to show you no matter what. So it doesn't affect the odds at all.

See above. If the dealer showed the other player the only coin he saw, or showed a random or preselected coin, and then he shows you that same coin, your chance is 1/2. If he shows you the other coin and it's a head, then your chance is 1. That is what happens when a coin is identified, even if the identification is implicit: the coins become distinct and can, nay must, be considered separately.

If the dealer is cagey about how he selects and shows coins, then, yes, you should presume that he is forcing coins on you to make things look most favorable to you.

I'm sure Mark is talking about the case when both children walk through the curtain.

I was indeed talking about this case. Sorry if that wasn't obvious.

My conjecture as it has been from day 1 is that the event of seeing a girl step from being the curtain, assuming that there is no randomness in how she happened to be stepping from behind the curtain, and assuming that there was no other identifying information, provides the same amount of information as "there is at least one girl"

The only randomness required in my model is the 2x2 matrix of possible outcomes of children's genders. So I further contend that my conjecture is consistent with an interpretation of the question that relies on minimal assumptions.

In any computer simulation of my setup there would have to be some type of (in Mike P's word) "Monte ex machina" to see if there was at least one girl (and to subsequently push her out). Something similar is required in any computer simulation that requires the information of {at least one girl}

At this stage I imagine that none of this is controversial with the except of my statement that " .. my conjecture is consistent with an interpretation of the question that relies on minimal assumptions.".

I can provide a discussion of how an assumption of randomness is a stronger assumption than an assumption on unknown-but-non-random if this is indeed our only sticking point.

The list above if from weakest to strongest assumption.

There are no changes from my earlier posts, and the new requested probabilities follow trivially.

P(girl steps thru curtain) = P(at least one girl) = 1-P(no girls) = 1-1/4 =3/4

no?

P(boy steps thru curtain) = P(at least one boy) = 1-P(no boys) = 1-1/4 =3/4

I don't think we are getting anywhere here.. I have already told you that these events are not mutually exclusive.

Anyway I have done a bit of poking around and a very similar problem with solution appears in Ross "A first course in probability" 7th ed p 83.

In Ross' problem we are meeting a mother and her daughter on the street. The question is what is the probability that the other daughter is female.

Ross explains that the answer depends on how the mother selected the daugher to walk with, and that under a random selection process the Prob of 2 daughters is 1/2

But under a non-random selection process (and I quote) ".. the probability of seeing the mother with a daughter is now equivalent to the event that there is at least one daughter"

I don't think we are getting anywhere here.. I have already told you that these events are not mutually exclusive.

I don't see how they are not mutually exclusive.

To be clear, I'm going with the problem statement...

1. Two children are born behind a curtain
2. A girl steps through the curtain
3. What is the probability the child remaining behind the curtain is a girl?

The event "a boy steps through the curtain" is the sole alternative to (2), and it is absolutely mutually exclusive to (2).

That is, could you please fill in that table?

While I wait for mark to fill out that probability table, let me make the case for my assumptions.

Take the problem statement...

1. Two children are born behind a curtain
2. A girl steps through the curtain
3. What is the probability the child remaining behind the curtain is a girl?

Here is the root of my position on the assumptions: The question should be fair. In particular, since "girl" and "boy" are symmetric, any question about the probability of an event should be unchanged by swapping "girl" and "boy" in the problem statement.

The king's sibling is 1/3 without argument because "king" and "queen" are, in common understanding, not symmetric: "king" trumps "queen". But "girl" does not, in common understanding, trump "boy", and the fact that the problem statement talks about girls should not be taken as evidence that it does.

The second reason not to assume the problem is asymmetric toward girls is that the player has no reason to know where the bias for girls stops. In particular, if one presumes a Monty ex machina who shoves a girl out when possible and only otherwise shoves out a boy, is that not assuming more than simply saying that only girls are born in the first place?

There seems to be a determined effort to minimize "randomness". Well, if there is a bias towards girls, the least random way to accomplish that is to say that only girls are born. Not only is randomness reduced: there is also no need for the Monty ex machina. The answer to the problem then becomes 1.

Put another way, you believe -- given no particular reason by the problem statement -- that a girl will step through the curtain with probability 3/4. Yet the problem statement does not tell what sexes are born with what probability. If a girl is born with probability 1, then the problem is completely consistent, "a girl steps through the curtain" is the only possible statement for (2) and P(GG|G) = 1.

To come up with 1/3 as the answer to this simple problem requires an asymmetric bias towards girls that assumes just enough "randomness" to get 1/3, but not enough to get 1/2. Isn't that rather arbitrary?

Previously I was addressing the question where both a girl or a boy, both (or indeed neither) could be observed, under the assumption that there was no randomness in how these child or children came to be observed.

I will now address Mike P's problem:

1. Two children are born behind a curtain
2. A single child (boy or girl) steps through the curtain
3. What is the probability the child remaining behind the curtain is a girl?

As we all know, the answer here depends on how the child that we view is selected, as different selection policies produce different answers. If the child is selected randomly there we may get one one answer, and if the child selection is done non-randomly we may get other answers.

Given that at step 2 we could see either sexes, any deterministic selection approach will seem very contrived. But it is possible. As an example we could do the following: If there is a girl present select her, if not select a boy. In this case seeing a girl corresponds to the event {at least one girl} and seeing a boy corresponds to {no girls} and all the probabilities follow. One could do other deterministic selection policies that would result in different answers.

Interestingly enough if you consider the question that I was originally addressing where we could observe 0,1, or 2 children, then in this case a solution based on a random section of those choices may seem less appealing. In such a problem one would need to assume a distribution for k, where k=0,1,2 represents the number of children seen. It is not so clear what the appropriate distribution in this case is.

The answer to the you posed (e.g. Jul 19, 2008 8:24:46 PM) depends on how the child that we view is selected. So I can't answer your question until you specify how that child is selected.

In my earlier posts, I did not assume that the child was chosen randomly. I can also answer this question without assuming any randomness in the child's selection, if that what you want.

Without assuming any randomness the child seen could be chosen using: If there is a girl present select her, if not select a boy. In this case seeing a girl corresponds to the event {at least one girl} and seeing a boy corresponds to {no girls}, and we have:

P(two girls | at least one girl) =1/3
P(two girls | no girls) =0
P(one boy one girl | at least one girl) =2/3
P(one boy one girl | no girls) = 0
P(two boys | no girls) = 1

mark,

I await your filling in that table.

Meanwhile, in your 10:12:34 AM comment you seem uncomfortable with the problem statement, changing "A girl steps through the curtain" to "A single child (boy or girl) steps through the curtain".

That phrasing is pretty clunky and inelegant. Why did you feel the need to change to it?

Perhaps you would like this problem statement better:

1. Two children are born behind a curtain
2. A child steps through the curtain
3. What is the probability the child remaining behind the curtain is the same sex as the child who stepped through the curtain?

What's your answer? What's your reasoning?

P(G1) = 1
P(B1) = 0

When working with conditional probabilities, P(G1) and P(B1) are a priori probabilities, i.e., the probabilities that those events happen. They are not the probabilities (certainties!) that those events were observed or not observed.

When you flip a coin and it lands heads, do you say P(heads) is 1? No. P(heads) is 1/2. When a guy flips two coins, looks at them, and shows you a heads, is the P(heads) you plug into Bayes' theorem 1? No. It's 1/2.

Similarly, P(G1) and P(B1) are measures on the result of some (unknown) mechanism that causes that first child to step through the curtain. They are not measures of an observation.

What do you think those probabilities are?

As I noted at Jul 19, 2008 8:56:05 PM, I think they are all 1/2.

Actually please ignore the first part of my last post.. I misunderstood your notation.

Actually please ignore the first part of my last post.. I misunderstood your notation.

What notation should I have used for the prior probabilities of G1 and B1?

The reason I changed the question was that, if you specify that the girl walked thru the curtain, ...

Incidentally, it was you who specified the question. This is nothing but your Q3 from way upthread at Jul 15, 2008 9:15:15 PM. I am simply repeating it in its elements to get an answer.

Nonetheless, I'd be interested in seeing how you tackle:

1. Two children are born behind a curtain
2. A child steps through the curtain
3. What is the probability the child remaining behind the curtain is the same sex as the child who stepped through the curtain?

What's your answer? What's your reasoning?

These questions can only be answered knowing how the child is selected. If the child is selected randomly then probabilities above would be 1/2. if the child were selected non-randomly according to the method I proposed, then I believe I listed correct answers above. Do you disagree with my answers (subject to the conditions I just specified) or not?

OK we agree. Your problem is interesting in that the non-random selection policy seems to be rather contrived and unappealing, The problem you outline on Jul 21, 2008 7:31:17 shares this, so there's no point in going through that.

There are similar problems where a random selection procedure may not be so appealing, wheres as non-random approach seems neat. I listed one at the bottom of my posting Jul 21, 2008 10:12:34 AM. If you think that you can formulate a nice random selection approach for this problem, then I would be interested to hear your thoughts. Otherwise I think I'm done.

1. Two children are born behind a curtain
2. k girls steps through the curtain
3. What is the probability that there were 2 girls born, for k=0,1, and 2

Similarly to previous questions we may get different answers depending on whether the k children we see are chosen randomly, or not

For non-random selection and non-random k, it seems simple:

P(two girls | 0 girls step out) = 1/4
P(two girls | 1 girl steps out) = 1/3
P(two girl | 2 girls steps out) = 1

It's probably also easy for random girl selection and non-random k. But what about
modeling k as random variable? What distribution would you use? Does it affect the probabilities?

In a random setup you would need to know all the joint probabilities of k, girl selection, and child birth. A complex sample space, but maybe things simplify with good choices for the pmf's. It's a good example of how we don't always model things randomly, even if we can.

MikeP - thanks for your comments. They really helped me understand the problem. I'd be interested in your feedback on this example.

Well, after re-reading the book and some of the comments here more carefully, I do see the logic behind the problem as presented. i.e " the added information - our knowledge of the girl's name - makes a difference" AND "not all elements of the sample space are equally probable."

The answer is 1/8, just work it out...

I think that Mlodinow's original answer is wrong. In making it more complex (by including birth order) he makes a mistake. Ignore the Florida complications, and just ask the simpler question, "I know this person has two kids. One is a girl. What are the odds the other is also a girl?"

Simple answer: there is one child whose gender you haven't observed. Ignoring actual population statistics, it has a 50% chance of being a girl.

Now, include birth order. You have observed a girl (denoted GO, for girl observed). The other child could be a boy (B) or girl (G). There are FOUR equally likely possibilities, now including the (irrelevant) information about birth order:

GO-B
B-GO
GO-G
G-GO

Again, there is a 50% chance that the unobserved child is a girl. Mlodinow's mistake is that he counts GO-G and G-GO as ONE possibility, but if you only remember one child, you could have remembered either the first or second born.

Fu Wen 20103.03.11 The air max shoes just makes all sports customers satisfied with their products. Its outstanding designer Amare Stoudemire is a sportsman owns strength and smartness. He re-clarify how to be an outstanding basketball player. His grace and strength, also the revolutionary air max 360 sports shoes touched off the create inspiration for Tracy Teague who is the designer for Nike footwear, this is the origin of air max 360 basketball shoes. Air max 360 basketball adopt Nike advanced air cushioning technology to make shoes for the top basketball players who requires the optimum riding, just only nike air cushioning technology can make them content.
Every different style of shoes has its own different inspiration taken from some kind of sudden thinking. Air max 360 is a sort, air max 180 and air max 90 shoes has its unique design.Cheap nike air max shoes at airmaxsite.com. We provide you the cheapest price with the high quality for air max nike shoes.

fuwen 2010.03.18 It seems that this year air max nike shoes are especially popular, such as nike air max 180, nike max 2009, nike max 360 and so on. The nike air max 360 has been the greatest innovation for running shoes since the nike shox were released in 2000.The nike air max 360 brought the most unpaid possitive attention to nike in this century. Mens nike air max 360 shoes: breathable mesh upper with supportive rand and 360 degrees of reflectivity. The nike air max 360 features possibly the most cushioning ever engineered into a running shoe for one of the most comfortable rides ever. superior quality, the style is suitable for you. Cheap nike air max at airmaxsite.com. We will offer you lowest price, highest quality air max shoes and best service.

I think there are possible cases left out in many of the examples. (G,G) also has a possible (G,G) flip. (G1,G2), (G2,G1). In the case that there is one boy and one girl, their birth order is (G1,B1), or (B1,G1); then there is the (B1,B2) or (B2,B1) pair. If you know one child is a girl, you are left with four of the six sample spaces. Of those four, two are two girls; thus the chances of both children being girls if you know one is a girl are 1/2.

this does not assume any special characteristics of one girl. She may have been born first or second, and her sibling may or may not be a girl.

Forgive me, any other explanation seems staged and artificial. Common sense says that the chance of a girl in a family of two siblings having a sister is 1/2, not one in three.

Comments for this post are closed