Precision and recall are metrics used to describe the performance of binary predictors.
Terminology Note
Precision and Recall are used extensively in Information retrieval. In this context,
 relevant document = true instance
 retrieved document = true prediction
Precision¶
Given a set of True / False predictions and corresponding True / False instances, precision represents the accuracy rate of True predictions. It is sometimes referred to as positive prediction value.
Formula¶
$Precision=True Positives+False PositivesTrue Positives $Example¶
 $True Positives=2$
 $False Positives=1$
Recall¶
Given a set of True/False predictions and corresponding True/False instances, recall represents the accuracy rate of predictions on True instances. It is sometimes referred to as sensitivity.
Formula¶
$Recall=True Positives+False NegativesTrue Positives $Example¶
 $True Positives=2$
 $False Negatives=2$
FScore¶
FScore combines precision and recall into a single score in the range [0, 1], where
 0 indicates that all positive predictions are incorrect
 1 indicates that all true instances are correctly predicted with zero false positives
Formula¶
F1 Score (Balanced FScore)
Here, precision and recall are considered equally important.
$F_{1}=precision1 +recall1 2 =2⋅precision+recallprecision⋅recall $FBScore (Generalized FScore)
Here, recall is considered $β$ times as important as precision.
$F_{β}=(1+β_{2})⋅(β_{2}⋅precision)+recallprecision⋅recall $Example¶
Calculate F0.5, F1, and F2 for the following data.
:::summary ""
 $precision=0.67$
 $recall=0.50$
:::
PrecisionRecall Curve¶
The PrecisionRecall Curve plots precision vs recall for a range of classification thresholds. For example, suppose we have the following predicted scores and truths.
If we assume a classification threshold of 0 (also known as a cutoff value), we can convert each prediction score to a prediction class.
\text{pred_class} = \text{pred_score} \ge 0
Here, recall = 0.67 and precision = 1.00. This pair of values, (0.67, 1.00), is one point on the PrecisionRecall curve. Repeating this process for a range of classification thresholds, we can produce a PrecisionRecall curve like the one below.
Plot data
It doesn't look much like a curve in this tiny example, but PrecisionRecall curves applied to larger data usually look more like this.
R code to reproduce this plot
Precision generally decreases as recall increases, but this relationship is not guaranteed.
Perfectly ranked prediction scores are reflected by a PrecisionRecall curve like this.
Average Precision¶
Consider predictions from two models A & B compared to known thruths.
A and B have identical prediction classes, and therefore have identical precision, recall, and Fscores. However, suppose we reveal A's and B's prediction scores (in this case, probabilities).
The classification threshold in this example is 0.5. That is, preds_A = scores_A >= 0.5
and preds_B = scores_B >= 0.5
.
Observe
 When model A was confident in its predictions, it was correct.
 When model B was confident in its predictions, it was incorrect.
Intuitively, this suggests model A better than model B. This is depicted visually observing that A's PrecisionRecall curve is consistently higher than B's PrecisionRecall curve.
This desire to score ranked predictions leads to Average Precision which measures the average precision value on the PrecisionRecall curve over the range [0, 1].
Formula¶
Average Precision equals area under the PrecisionRecall curve.
$AveP=∫_{0}p(r)dr$where $p(r)$ is the precision at recall $r$.
This integral can be discretized as
$AveP=k=1∑n P(k)Δr(k)$where $k$ is the rank in the sequence of prediction scores (high to low), $n$ is the total number of samples, $P(k)$ is the precision at cutoff $k$ in the list, and $Δr(k)$ is the change in recall from items $k−1$ to $k$.
Geometrically, this formula represents a Riemann Sum approximation to the PrecisionRecall curve using rectangles with height $P(k)$ and width $Δr(k)$.
R code to reproduce this plot
Since recall ranges from 0 to 1, this can be interpreted as a weighted sum of Precisions whose weights are the widths of the rectangles (i.e. the changes in recall from threshold to threshold), hence the name Average Precision.
Furthermore, the width of each nonzerowidth rectangle is the same. Alternatively stated, each positive change in recall is equivalent. Thus, Average Precision can be described as an arithmetic mean of Precision values restricted to the set of True instances (relevant documents).
$AveP=number of true instances∑_{k=1}P(k)×rel(k) $where $rel(k)$ is an indicator function equaling 1 if the item at rank $k$ is a true instance, 0 otherwise.
Example¶
Using the toy data above,
Formula 1
$AveP=k=1∑n P(k)Δr(k)=1.00⋅0.25+1.00⋅0.25+0.60⋅0.25+0.67⋅0.25=0.82$Formula 2
$AveP=number of true instances∑_{k=1}P(k)×rel(k) =41.00+1.00+0.60+0.67 $
Formula 1
$AveP=k=1∑n P(k)Δr(k)=0.33⋅0.25+0.50⋅0.25+0.43⋅0.25+0.50⋅0.25=0.44$Formula 2
$AveP=number of true instances∑_{k=1}P(k)×rel(k) =40.33+0.50+0.43+0.50 $Variations¶

Some authors prefer to estimate Area Under the Curve using trapezoidal approximation.

Some authors use interpolated precision whereby the PrecisionRecall curve is transformed such that the precision at recall $r$ is taken to be the $max(precision)$ at all $recall≥r$.
Precision at k (P@k)¶
Precision at k measures What percent of the top k ranked predictions are true instances? This is useful in settings like information retrieval where one only cares about the top k ranked documents returned by a search query.
Example¶
Suppose we build a system that predicts which movies are relevant to a particular search query (e.g. "dog movies"). Our database contains 12 movies which receive the following prediction scores and prediction classes.

Filter the the top 5 instances by
pred_score
. 
Count the true instances (i.e. relevant movies).

Divide by k
$Precision @ 5=53 =0.6$
Edge Cases¶

What if there are fewer than $k$ True instances (relevant documents)?
This scenario presents a drawback to $Precision @ k$. If there are a total of $n$ True instances (relevant documents) where $n<k$, the model will have at most $Precision @ k=kn <1$.

What if there are fewer than $k$ True predictions (retrieved documents)?
If a model returns $m<k$ True predictions (retrieved documents), one possible implementation is to treat the next $k−m$ predictions as False Positive predictions (irrelevant documents).
Average Precision at k (AP@k)¶
Average Precision at k represents Average Precision amongst the top k classification thresholds (cutoffs). It can also be described as area below the Precision Recall curve, restricted to the top k thresholds, and normalized by the lesser of k and the total number of true instances (relevant documents).
Ambiguity
The term Average Precision at k usually implies (Average Precision) @ k, not Average (Precision @ K). These are distinctly different!
Formula¶
$AP@k=min(k,total true instances)∑_{i=1}P(i)×rel(i) $where $rel(i)$ is an indicator function equaling 1 if the item at rank $i$ is a true instance, 0 otherwise.
Example¶
Given the following predictions and truths, calculate $AP@3$.

Filter the top k=3 entries by highest prediction score.

Calculate precision at each of these cutoffs.

Sum the precisions of relevant documents (true instances)
$i=1∑3 P(i)×rel(i)=1.00+1.00=2.00$ 
Normalize by the lesser of $k$ and the total number of relevant documents (true instances)
In this example, the total number of true instances is 4.
$AP@k=min(k,total true instances)∑_{i=1}P(i)×rel(i) =min(3,4)2.00 =32 =0.67$
Intuition
Observe the PrecisionRecall curve for this example.
Average precision measures the area under the curve, which we can approximate with a Reimann sum.
Average Precision at 3 corresponds to area beneath the precision recall curve restricted to the first three cutoffs in the ordered predictions. In other words, the area beneath the curve to the left of the third point.
However the rectangle widths are rescaled to have width $31 $ so that the maximum achievable area beneath the curve in this range is 1. This is the purpose of the denominator, $min(k,total true instances)$ in the formula for $AP@k$.
Mean Average Precision (MAP)¶
Mean Average Precision calculates Average Precision for Q queries and averages their results.
Formula¶
$MAP=Q∑_{q=1}AP(q) $Mean Average Precision at k (MAP@k)¶
Mean Average Precision at k calculates Average Precision at k for Q queries and averages their results.