The Boy or Girl problem is a well-known example in probability theory:
Investigation of these questions reveals that their answers are very different:
Common assumptionsThere are four possible combinations of children. Labeling boys B and girls G, and using the first letter to represent the older child, the possible combinations are:
These four possibilities are taken to be equally likely a priori. This follows from three assumptions:
It is worth noting that these conditions form an incomplete model. By following these rules, we ignore the possibilities that a child is intersex, the ratio of boys to girls is not exactly 50:50, and (amongst other factors) the possibility of identical twins means that sex determination is not entirely independent. However, one can see intuitively that the occurrence of each of these exceptions is sufficiently rare to have little effect on our simple analysis of the general population. First question
When the older child is a boy, then the elements {GG} and {GB} of original sample space cannot be true, and must be deleted so that the problem reduces to:
Or, the set {BG, BB}. Since both of the two possibilities in the new sample space {BG, BB} are equally likely, and only one of the two, BG, includes a girl, the probability that the younger child is a girl is 1/2. Second question
An equivalent and perhaps clearer way of stating the problem is "Excluding the case of two girls, what is the probability that two random children are of different gender?" Neither order nor age is important. There are four possible child combinations for a two-child family as seen in the sample space above. Three of these families meet the criteria of having at least one boy. The set of possibilities (possible combinations of children that meet the given criteria) is:
Bayesian approachConsider the sample space of 2-child families.
Therefore the probability is 2/3. Third question
Does the additional bit of information that the boy's name is Jacob change anything?
Or, the set {GJ, JG, JB, BJ}, in which two out of the four possibilities includes a girl. Therefore we might think that the probability returns to 1/2. But this is wrong because it doesn't take into account different frequencies of each of these answers. The likelihood of a boy being named Jacob and a boy not being named Jacob are not equal. Thus, we must replace our classical interpretation of probability with either a Frequentist or Bayesian interpretation. (Note that in real life child names are not independent of each other. In particular, people usually do not give the same name to two children. Thus, this discussion is purely theoretical). Frequentist approachConsider 1,000,000 families that have two children. Assume that the gender and name of each child is independent, within family and between families. Assume that the probability of each individual child being a girl is .5; otherwise the child is a boy. Assume that the probability of a child having the name Jacob is .01, and that all children with name Jacob are also boys. In the table above, we have a list of all possible unique outcomes. But these outcomes do not have the same frequency. If we start with the assumption that the family has two children, we get the following frequency table:
With the additional bit of information that the family has a child named Jacob, we can break every instance of "Boy" into two: "Jacob" and "Boy not Jacob". For every 100 Boys, 1 will fall into the "Jacob" bin and 99 into the "Boy not Jacob" bin. Thus, we have the following table:
If we eliminate all instances that do not meet our given criteria ({Girl, Girl} {Girl, Boy not Jacob} {Boy not Jacob, Girl} {Boy not Jacob, Boy not Jacob}), then we eliminate 985,100 of our events, leaving 19,700 possible events. Of those, the successful events are {Girl, Jacob} and {Jacob, Girl}, or 9800 cases. So if the probability of a boy being named Jacob is 1 in 50, then the probability that the family has a girl is 98/197, or roughly 50%. But this value will change depending on the popularity of the name. At the extreme, if all boys were given the same name, then being named Jacob would provide no more information than being a boy, and thus the probability would still be 2/3 that the family has a girl. As the likelihood of the name decreases, the probability of a girl approaches the limit of 50%. If we further assume that the parents would not have named both children with the same name, we can eliminate {Jacob, Jacob}, leaving 19,600 possible events; thus there is an exactly 50% chance of the family having a girl. ConclusionMany people coming across this paradox for the first time will agree with the answer to the first question, but some may be confused by the answer to the second question. Two ways of explaining the error are as follows:
MistakesA look at why some "explanations" are flawed can be very explanatory. For example, to answer the second question someone may make this list of possibilities:
Apparently only the latter two are the ones sought for, giving a total probability of 1/2. The error here is that the first two statements are counted double. If there are two boys, we have no referent for "the boy". Therefore the first two possibilities should read:
But now it is clear that these two statements are equivalent – both effectively state that there are two boys – and therefore one should be removed. An ambiguous real-life versionTwo old classmates, Mary and Brian, meet in the street, not having seen each other since they left school.
Here, for some reason, the conversation is cut short. Formally, this corresponds to the second version as Brian only has told Mary that at least one child is a boy. Accordingly, the probability that Brian has a girl should be 2/3. However, in real conversation, if Brian had two boys, he would be more likely to answer, e.g., "Yes, they are both boys" (Grice's maxim of quantity). The fact that he does not answer like that could reasonably be taken by Mary as a clue increasing her posterior probability of one child being a girl above 2/3. This highlights the need for precision when stating such problems in probability. See alsoReferences
External links
| |