Ben Gorman

Ben Gorman

Life's a garden. Dig it.


You're working on a phone app that predicts if a 🪴 plant is edible, because you keep eating inedible plants, dangit!

You're using F1 Score to evaluate the performance of your model. Defend this evaluation metric or state your case for something better.


It'd be better to use a metric like F0.5 which considers recall half as important as precision.


When it comes to eating plants, it's very important not to eat something poisonous. Therefore, your evaluation metric should put a heavier emphasis on precision than recall. Alternatively stated,

  • when you predict a plant is edible, it's important to be correct, even at the cost of incorrectly predicting that 2 or 3 edible plants are inedible.
  • incorrect true predictions should be more heavily penalized than incorrect false predictions


  • precision = accuracy rate of true predictions
  • recall = accuracy rate of predictions on true instances