Coin flipping exampleFor example, say an experiment is performed to determine if a coin flip is fair (50% chance of landing heads or tails), or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). Since we consider both biased alternatives, a two-tailed test is performed. The null hypothesis is that the coin is fair, and that any deviations from the 50% rate can be ascribed to chance alone. Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips plus the chance of a fair coin landing on heads 6 or fewer times out of 20 flips. In this case the random variable T has a binomial distribution. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.0577. By symmetry, the probability that 20 flips of the coin would result in 14 or more tails (alternatively, 6 or fewer heads) is the same, 0.0577. Thus, the p-value for the coin turning up heads 14 times out of 20 total flips is 0.0577 + 0.0577 = 0.1154 . InterpretationGenerally, one rejects the null hypothesis if the p-value is smaller than or equal to the significance level, often represented by the Greek letter α (alpha). If the level is 0.05, then the results are only 5% likely to be as extraordinary as just seen, given that the null hypothesis is true. In the above example we have:
The calculated p-value exceeds 0.05, so the observation is consistent with the null hypothesis — that the observed result of 14 heads out of 20 flips can be ascribed to chance alone — as it falls within the range of what would happen 95% of the time were this in fact the case. In our example, we fail to reject the null hypothesis at the 5% level. Although the coin did not fall evenly, the deviation from expected outcome is just small enough to be reported as being "not statistically significant at the 5% level". However, had a single extra head been obtained, the resulting p-value (two-tailed) would be 0.0414 (4.14%). This time the null hypothesis - that the observed result of 15 heads out of 20 flips can be ascribed to chance alone - is rejected. Such a finding would be described as being "statistically significant at the 5% level". Critics of p-values point out that the criterion used to decide "statistical significance" is based on the somewhat arbitrary choice of level (often set at 0.05). A proposed replacement for the p-value is p-rep. It is necessary to use a reasonable null hypothesis to assess the result fairly. The choice of null hypothesis entails assumptions. Frequent misunderstandingsThe conclusion obtained from comparing the p-value to a significance level yields two and three results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level. You cannot accept the null hypothesis simply by the comparison just made (11% > 5%); there are alternative tests that have to be performed, such as some "goodness of fit" tests. It would be very irresponsible to conclude that the null hypothesis needs to be accepted based on the simple fact that the p-value is larger than the significance level chosen. The use of p-values is widespread; however, such use has come under heavy criticism due both to its inherent shortcomings and the potential for misinterpretation. There are several common misunderstandings about p-values.[1]
See alsoAdditional reading
References
External links
| |