Average Precision (AP) measures how well a model ranks relevant items at the top of the results list. Unlike binary classification metrics, AP heavily penalizes when correct matches are ranked low.
AP is calculated by averaging the precision values at each position where a relevant item appears. When you have multiple queries, Mean Average Precision (mAP) is simply the average AP across all queries.
AP is particularly well-suited for face recognition because it is rank-aware (a correct match at rank 1 contributes much more than at rank 10), handles class imbalance naturally, and directly aligns with the task of ranking gallery images to find correct matches first.
| Rank | Identity | Relevant? | Precision@k |
|---|
Top-k accuracy measures whether the correct identity appears among the model's k highest-confidence predictions, rather than requiring it to be the single top prediction. It answers: "Will the user find the right person in the top-k results?"
Recall@k shows the percentage of all relevant items found in the top-k positions. For example, if there are 4 relevant items and 3 appear in the top-5, recall@5 is 75%.
Both metrics are useful: top-k accuracy tells you if the task succeeded (user found at least one match), while recall@k tells you how comprehensive the top-k results are.