DefinitionsImagine a scenario where people are tested for a disease. The test outcome can be positive (sick) or negative (healthy), while the actual health status of the persons may be different. In that setting:
SensitivityA sensitivity of 100% means that the test recognizes all sick people as such. Thus in a high sensitivity test, a negative result is used to rule out the disease. Sensitivity alone does not tell us how well the test predicts other classes (that is, about the negative cases). In the binary classification, as illustrated above, this is the corresponding specificity test, or equivalently, the sensitivity for the other classes. Sensitivity is not the same as the positive predictive value (ratio of true positives to combined true and false positives), which is as much a statement about the proportion of actual positives in the population being tested as it is about the test. The calculation of sensitivity does not take into account indeterminate test results. If a test cannot be repeated, the options are to exclude indeterminate samples from analyses (but the number of exclusions should be stated when quoting sensitivity), or, alternatively, indeterminate samples can be treated as false negatives (which gives the worst-case value for sensitivity and may therefore underestimate it). SpecificityA specificity of 100% means that the test recognizes all healthy people as healthy. Thus a positive result in a high specificity test is used to rule in the disease. The maximum is trivially achieved by a test that claims everybody healthy regardless of the true condition. Therefore, the specificity alone does not tell us how well the test recognizes positive cases. We also need to know the sensitivity of the test to the class, or equivalently, the specificities to the other classes. A test with a high specificity has a low Type I error rate. Specificity is sometimes confused with the precision or the positive predictive value, both of which refer to the fraction of returned positives that are true positives. The distinction is critical when the classes are different sizes. A test with very high specificity can have very low precision if there are far more true negatives than true positives, and vice versa. Worked example
Related calculations
Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV=10%) and further investigations must be undertaken, it will though pickup 66.7% of all cancers (the sensitivity). However as a screening test, a negative result is very good at reassuring that a patient does not have cancer (NPV=99.5%) and at this initial screen correctly identifies 91% of those who do not have cancer (the specificity). Terminology in information retrievalIn information retrieval positive predictive value is called precision, and sensitivity is called recall. The F-measure can be used as a single measure of performance of the test. The F-measure is the harmonic mean of precision and recall: In the traditional language of statistical hypothesis testing, the sensitivity of a test is called the statistical power of the test, although the word power in that context has a more general usage that is not applicable in the present context. A sensitive test will have fewer Type II errors. See also
References
External links
| | ||||||||||||||||||||||||||||||||||||||||||||||