How do you want your model to perform?

 

Precision and Recall are very useful in understanding how your model is performing, and what situations it might be better to perform in.

We’ll go into them both in more detail, but just in case you haven’t already, visit our Confusion Matrix post and get to grips with true and false positives and negatives. They’ll come in handy here!

Read it?

Good.

We’re going to make use of them now.

Precision – Positive Prediction Value

When considering the precision of our model, we think about how often our positive predictions are actually true.

We’re basically asking:

“What proportion of positive predictions from our model were actually correct?”

Because we’re only thinking of positive predictions, the formula for this is:

To calculate precision, we count the number of true positives, divided by the sum total number of true positives and false positives.

Recall – True Positive Rate

What proportion of actual positives was identified correctly?

The true positive rate has a few different names:

Recall, sensitivity, probability of detection.

We’re going to work with the term recall most, but you will also see the term sensitivity a lot. We’ll touch on this later.

They all sound super clever, but look at what the formula actually is.

It’s simply: the number of true positives, divided by the sum of the true positives and false negatives.

This means it scores our model’s ability to correctly identify the positive result.

An example of this would be in disease testing, where the model’s ability to correctly detect an ill patient, given that they definitely have the disease.

This is really helpful; in the medical industry it’s very likely we want to maximise our chance of identifying ill patients.

It’s not the only helpful metric, however.

So.. Which one should I use?

Both precision and recall are very helpful, very insightful metrics. Precision indicates the percentage of your results that are actually relevant (in this case, think of the positive result), whereas recall measures the proportion of all the relevant results that are correctly classified by our model.

As no model is perfect, we generally have to compromise between the two, when we are optimising our model.

 

Optimising for Recall

If you want to capture as many of the positive results as possible, you want to maximise recall as your priority. In the scenario we mentioned above, identifying ill patients is something you really want to get right. In this case, you’d rather make sure you were most likely to identify all the ill patients. If you also incorrectly identified some healthy patients as ill, that’s probably (but not always) less harmful.

Of course, in this example, that has a trade-off itself of the cost of how much damage incorrect medicine might have on a healthy patient incorrectly identified with the disease.

Optimising for Precision

Let’s consider a different scenario altogether.
Imagine we live in a world similar to that of the 2002 film, Minority report. In that film, a special group of police are able to arrest people for crimes that they haven’t committed yet.
Now, ethically, this is a murky area.
In a more realistic setting, we might have a model that predicts whether or not someone has committed a crime, and if so, they will go to prison.
In this case, you might want to make sure that anytime the model prediction is positive – they have committed the crime – we are very very confident in the prediction. You wouldn’t want the burden of sending someone down, when in actual fact they’re innocent.
In this case, precision is what we want to optimise for.
Of course, as in the previous example, there is a trade-off.
When we optimise for precision here, our model may be more likely to let actual criminals go free.
How would you optimise? it’s not always an easy decision.

 

Next – Sensitivity vs Specificity

In our next post, we will revisit recall, but under the guise of “Sensitivity”. We’ll discuss how this is related to a new metric, Specificity, and further our understanding of our model performance.