Interactive retrieval metrics

Explore how different metrics evaluate face recognition performance
Average precision
Top-k accuracy
Instructions: Select a person to query from the dropdown. Drag items in the ranked list to reorder them. Items marked in green are images of the selected query person (relevant to the query). Watch how the average precision metric changes based on the ranking order.

What is average precision?

Average Precision (AP) measures how well a model ranks relevant items at the top of the results list. Unlike binary classification metrics, AP heavily penalizes when correct matches are ranked low.

AP is calculated by averaging the precision values at each position where a relevant item appears. When you have multiple queries, Mean Average Precision (mAP) is simply the average AP across all queries.

AP is particularly well-suited for face recognition because it is rank-aware (a correct match at rank 1 contributes much more than at rank 10), handles class imbalance naturally, and directly aligns with the task of ranking gallery images to find correct matches first.

Query vs gallery
Relevant (João Silva)
Non-relevant (Other persons)
Ranked list
Precision calculation & average precision
Rank Identity Relevant? Precision@k

Average precision

0.000
Higher is better — Perfect score = 1.000
Instructions: Select a person to query from the dropdown (same as mAP tab). Select different values of k to see which items fall within the top-k results. Drag items in the ranked list to observe how different rankings affect the metrics.

Understanding top-k metrics

Top-k accuracy measures whether the correct identity appears among the model's k highest-confidence predictions, rather than requiring it to be the single top prediction. It answers: "Will the user find the right person in the top-k results?"

Recall@k shows the percentage of all relevant items found in the top-k positions. For example, if there are 4 relevant items and 3 appear in the top-5, recall@5 is 75%.

Both metrics are useful: top-k accuracy tells you if the task succeeded (user found at least one match), while recall@k tells you how comprehensive the top-k results are.

Select k value
Top-k:
k = 1
k = 3
k = 5
k = 10
Ranked list (top-k highlighted)
In top-k
Relevant (João Silva)
Non-relevant (Other persons)
Top-k accuracy result
Top-k accuracy
0%
Recall@k
0.0%
Relevant in top-k
0 / 2
Recall@k at different k values