Ben Gorman

Ben Gorman

Life's a garden. Dig it.

Challenge

You've developed a video monitoring system for ovens that alerts you when a batch of brownies is cooked to perfection. Through a delicious validation procedure, you've acquired the following predictions and truths.

import pandas as pd
 
df = pd.DataFrame({
    'yhat': [0.32, 0.65, 0.16, 0.1, 0.1, 0.78, 0.5, 0.03], 
    'y': [True, True, False, False, False, True, False, True]
})
 
print(df)
#    yhat      y
# 0  0.32   True
# 1  0.65   True
# 2  0.16  False
# 3  0.10  False
# 4  0.10  False
# 5  0.78   True
# 6  0.50  False
# 7  0.03   True
df <- data.frame(
    yhat = c(0.32, 0.65, 0.16, 0.1, 0.1, 0.78, 0.5, 0.03),
    y = c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE)
)
 
print(df)
# yhat     y
# 1 0.32  TRUE
# 2 0.65  TRUE
# 3 0.16 FALSE
# 4 0.10 FALSE
# 5 0.10 FALSE
# 6 0.78  TRUE
# 7 0.50 FALSE
# 8 0.03  TRUE

Calculate the precision and recall of your model using a prediction threshold of 0.5. That is, assume your model predicts True when yhat >= 0.5.


Solution

Precision = 0.67, Recall = 0.5

df['pred_class'] = df.yhat >= 0.5
 
# Calculate True Positives, False Positives, False Negatives
TP = ((df.pred_class == True) & (df.y == True)).sum()
FP = ((df.pred_class == True) & (df.y == False)).sum()
FN = ((df.pred_class == False) & (df.y == True)).sum()
 
# Calculate precision, recall
precision = TP/(TP + FP)  # 0.67
recall = TP/(TP + FN)     # 0.50
# Insert pred_class column
df$pred_class <- df$yhat >= 0.5
 
# Calculate True Positives, False Positives, False Negatives
TP <- sum(df$pred_class == T & df$y == T)
FP <- sum(df$pred_class == T & df$y == F)
FN <- sum(df$pred_class == F & df$y == T)
 
# Calculate precision, recall
precision <- TP/(TP + FP)  # 0.67
recall <- TP/(TP + FN)     # 0.50