Sensitivity and Specificity

The problem with trying to fix an error after making many mistakes is that we tend to err in the opposite direction. You know the saying: first by omission and then by excess?

In statistics, we say there are two types of errors: “Type I error” and “Type II error,” which are terrible names because they don’t help us understand what each is, show a complete lack of creativity on the part of statisticians, and diminish curiosity about this very important information.

A much better name for “Type I error” is “false alarm” or “false positive.” This kind of error occurs when something you claimed to be true turns out to be false. I know you don’t like statistical examples like “rolling dice” or “drawing balls from a box,” so I’ll try something more along the lines of gossip.

The first step is to ask a question: “Does she/he love me?”

Then, you gather evidence that can help you answer it: what he/she said here, what he/she did there, and what he/she said elsewhere. Eventually, you add what others are saying, and so forth.

Then you put it all into a model and reach a conclusion: “Yes, she/he loves me.”

A hypothesis test can even calculate the probability that your conclusion is a false alarm (which in this case means that everything he/she said and did was actually just a coincidence and not love). If the chance of it being a false alarm is less than 5%, you consider your conclusion correct (even if you are actually only 95% certain).

But 95% is very good, right? It’s almost certain, isn’t it?! With 95% certainty, I’d even go under the window to serenade my love. The 5% chance of getting heartbroken is there, but it’s better to regret what you’ve done than what you haven’t, right? Well, for educational purposes, let’s say yes.

Ninety-five percent certainty should be very good because it’s a sacred value for scientists. If your probability of error (p-value) is 0.05 (equal to a 95% chance of being right), then your hypothesis will be accepted, your data will be published, and your thesis will be approved. Otherwise, if it’s 0.06, 0.1, or any other value greater than 0.05, then you’re out of luck.

This value doesn’t depend on the data. If you collected well, they’re good data (otherwise, you’re also out of luck). It’s not a matter of interpretation either. Different interpretations can lead to different conclusions, but the chance of being right or wrong remains the same.

The issue is how much error you’re willing to accept. Here’s another example: if you know there’s a 5% chance of rain, will you take an umbrella with you? Well, I wouldn’t. Only a 5% chance isn’t enough to make me carry that bulky thing around all day. For that, though, you must accept that you might get wet 5% of the time you leave the house.

It’s true, not everyone accepts this. Some people get mad at the rain and curse the drops. But those who go out in the rain should be prepared to get wet, right?!

On the other hand, some would leave without an umbrella even if the chance of getting wet was 6%, 7%, or even 10%. Or more. As Richard Gordon said, “Scientifically, though it may be depressing, we’re nothing more than waterproof sacks full of chemicals and loaded with electricity.” In our daily lives, we can and must make decisions with less than 95% certainty, but scientists have to keep that high standard.

This is needed because a false positive is a double problem: not only did you accept something false as true, but you also failed to discover the real truth!

That’s why we generally don’t care much about “Type II errors,” which are false negatives. It just means we “missed a good opportunity to find the truth.” If she/he thinks you don’t love her/him when you do, it can cause a tremendous “headache,” but eventually, new evidence will clarify the truth. And this is another reason why we care less about false negatives. The chance of a Type II error greatly decreases with the accumulation of evidence, and generally, we can avoid it through common sense. If you base your conclusion on just one letter that he/she wrote to you, it may be the most beautiful poem ever written, but you’ll never be 95% sure that he/she loves you with just that. So naturally, you look for more evidence to reach your conclusion.

Type I errors are errors of lack of specificity: One person loves you, another doesn’t, but you can’t see the difference. Type II errors are errors of lack of sensitivity: he/she might love you, but you can’t tell for sure. If you correct your lack of sensitivity, you should automatically improve your specificity (though not in the same proportion). In practice, unfortunately, this doesn’t always work out because consistency, which is a statistical assumption, is not an innate human quality.

Another reason may be “Type III error” (discovered after the first two), which is: “You asked the wrong question!”

Excerpt from the book ‘The Truth about Dogs and Cats‘ (by the author, Portuguese only). Originally published in Portuguese in 31 January 2010 at ‘Você que é biólogo…’ blog